{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1、数据处理\n",
    "\n",
    "## 1.1、数据加载\n",
    "首先，我们需要将收集的数据加载到内存中，才能进行进一步的操作。pandas提供了非常多的读取数据的函数，分别应用在各种数据源环境中，我们常用的函数为：\n",
    "* read_csv\n",
    "* read_table\n",
    "* read_sql\n",
    "\n",
    "说明：\n",
    "* read_csv与read_table默认使用的分隔符不同。\n",
    "\n",
    "### 常用参数\n",
    "read_csv与read_table常用的参数：\n",
    "- sep / delimiter\n",
    "- header\n",
    "- names\n",
    "- index_col\n",
    "- usecols"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>95002</td>\n",
       "      <td>刘晨</td>\n",
       "      <td>女</td>\n",
       "      <td>19</td>\n",
       "      <td>IS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>95017</td>\n",
       "      <td>王风娟</td>\n",
       "      <td>女</td>\n",
       "      <td>18</td>\n",
       "      <td>IS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>95018</td>\n",
       "      <td>王一</td>\n",
       "      <td>女</td>\n",
       "      <td>19</td>\n",
       "      <td>IS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>95013</td>\n",
       "      <td>冯伟</td>\n",
       "      <td>男</td>\n",
       "      <td>21</td>\n",
       "      <td>CS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>95014</td>\n",
       "      <td>王小丽</td>\n",
       "      <td>女</td>\n",
       "      <td>19</td>\n",
       "      <td>CS</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       0    1  2   3   4\n",
       "0  95002   刘晨  女  19  IS\n",
       "1  95017  王风娟  女  18  IS\n",
       "2  95018   王一  女  19  IS\n",
       "3  95013   冯伟  男  21  CS\n",
       "4  95014  王小丽  女  19  CS"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 读取csv文件，返回一个DataFrame类型的对象。\n",
    "# 在读取的时候，默认会将第一行记录当成标题。如果没有标题，我们可以指定header=None。\n",
    "df = pd.read_csv(r\"c:/student.csv\", header=None)\n",
    "display(df.head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>95002</td>\n",
       "      <td>刘晨</td>\n",
       "      <td>女</td>\n",
       "      <td>19</td>\n",
       "      <td>IS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>95017</td>\n",
       "      <td>王风娟</td>\n",
       "      <td>女</td>\n",
       "      <td>18</td>\n",
       "      <td>IS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>95018</td>\n",
       "      <td>王一</td>\n",
       "      <td>女</td>\n",
       "      <td>19</td>\n",
       "      <td>IS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>95013</td>\n",
       "      <td>冯伟</td>\n",
       "      <td>男</td>\n",
       "      <td>21</td>\n",
       "      <td>CS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>95014</td>\n",
       "      <td>王小丽</td>\n",
       "      <td>女</td>\n",
       "      <td>19</td>\n",
       "      <td>CS</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       0    1  2   3   4\n",
       "0  95002   刘晨  女  19  IS\n",
       "1  95017  王风娟  女  18  IS\n",
       "2  95018   王一  女  19  IS\n",
       "3  95013   冯伟  男  21  CS\n",
       "4  95014  王小丽  女  19  CS"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# read_csv默认使用逗号作为分隔符，我们可以使用sep或delimiter来指定分隔符。\n",
    "df = pd.read_csv(r\"c:/student.csv\", header=None, sep=\",\")\n",
    "display(df.head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>学号</th>\n",
       "      <th>姓名</th>\n",
       "      <th>性别</th>\n",
       "      <th>年龄</th>\n",
       "      <th>部门</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>95002</td>\n",
       "      <td>刘晨</td>\n",
       "      <td>女</td>\n",
       "      <td>19</td>\n",
       "      <td>IS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>95017</td>\n",
       "      <td>王风娟</td>\n",
       "      <td>女</td>\n",
       "      <td>18</td>\n",
       "      <td>IS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>95018</td>\n",
       "      <td>王一</td>\n",
       "      <td>女</td>\n",
       "      <td>19</td>\n",
       "      <td>IS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>95013</td>\n",
       "      <td>冯伟</td>\n",
       "      <td>男</td>\n",
       "      <td>21</td>\n",
       "      <td>CS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>95014</td>\n",
       "      <td>王小丽</td>\n",
       "      <td>女</td>\n",
       "      <td>19</td>\n",
       "      <td>CS</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      学号   姓名 性别  年龄  部门\n",
       "0  95002   刘晨  女  19  IS\n",
       "1  95017  王风娟  女  18  IS\n",
       "2  95018   王一  女  19  IS\n",
       "3  95013   冯伟  男  21  CS\n",
       "4  95014  王小丽  女  19  CS"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 如果header为None，read_csv默认会自己生成列标签。（0， 1， 2， 3……）。我们可以通过names参数来指定列标签（标题）\n",
    "df = pd.read_csv(r\"c:/student.csv\", header=None, names=[\"学号\", \"姓名\", \"性别\", \"年龄\", \"部门\"])\n",
    "display(df.head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>95002</th>\n",
       "      <td>刘晨</td>\n",
       "      <td>女</td>\n",
       "      <td>19</td>\n",
       "      <td>IS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95017</th>\n",
       "      <td>王风娟</td>\n",
       "      <td>女</td>\n",
       "      <td>18</td>\n",
       "      <td>IS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95018</th>\n",
       "      <td>王一</td>\n",
       "      <td>女</td>\n",
       "      <td>19</td>\n",
       "      <td>IS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95013</th>\n",
       "      <td>冯伟</td>\n",
       "      <td>男</td>\n",
       "      <td>21</td>\n",
       "      <td>CS</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95014</th>\n",
       "      <td>王小丽</td>\n",
       "      <td>女</td>\n",
       "      <td>19</td>\n",
       "      <td>CS</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         1  2   3   4\n",
       "0                    \n",
       "95002   刘晨  女  19  IS\n",
       "95017  王风娟  女  18  IS\n",
       "95018   王一  女  19  IS\n",
       "95013   冯伟  男  21  CS\n",
       "95014  王小丽  女  19  CS"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 对于行索引，默认会自动生成（0， 1， 2， 3 ……）如果我们需要自己指定某列充当行索引（例如，数据库，数据表中的主键）\n",
    "# 我们可以使用index_col参数来进行设置。\n",
    "df = pd.read_csv(r\"c:/student.csv\", header=None, index_col=0)\n",
    "display(df.head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>95002</th>\n",
       "      <td>刘晨</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95017</th>\n",
       "      <td>王风娟</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95018</th>\n",
       "      <td>王一</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95013</th>\n",
       "      <td>冯伟</td>\n",
       "      <td>男</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95014</th>\n",
       "      <td>王小丽</td>\n",
       "      <td>女</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         1  2\n",
       "0            \n",
       "95002   刘晨  女\n",
       "95017  王风娟  女\n",
       "95018   王一  女\n",
       "95013   冯伟  男\n",
       "95014  王小丽  女"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 我们可以使用usecols来控制需要哪些列。如果某列充当索引列（index_col），则充当索引列的标签，也需要指定在usecols中。\n",
    "df = pd.read_csv(r\"c:/student.csv\", header=None, index_col=0, usecols=[0, 1, 2])\n",
    "display(df.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "从数据库中读取数据。与read_csv相同，也会返回DataFrame对象。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>name</th>\n",
       "      <th>age</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>33</td>\n",
       "      <td>sk</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   id name  age\n",
       "0  33   sk   17"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import sqlite3\n",
    "con = sqlite3.connect(\"test.db\")\n",
    "con.execute(\"drop table if exists person\")\n",
    "con.commit()\n",
    "# 创建数据表\n",
    "con.execute(\"create table person(id int primary key, name varchar(30), age int)\")\n",
    "# 向表中插入数据\n",
    "con.execute(\"insert into person(id, name ,age) values(33, 'sk', 17)\")\n",
    "# 从数据库中读取数据，sql指定查询的数据（用来构造DataFrame）。con数据库的链接。\n",
    "t = pd.read_sql(\"select id, name, age from person\", con)\n",
    "display(t)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 写入文件\n",
    "DataFrame与Series对象的to_csv方法，可以将数据写入文件或者指定的数据流中。\n",
    "* to_csv\n",
    "\n",
    "### 常用参数\n",
    "* sep 分隔符\n",
    "* header 是否写入标题行，默认为True。\n",
    "* na_rep 空值的表示\n",
    "* index 是否写入索引，默认为True。\n",
    "* index_label 索引字段的名称\n",
    "* columns 写入的字段，默认为全部写入。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>10</td>\n",
       "      <td>11</td>\n",
       "      <td>12</td>\n",
       "      <td>13</td>\n",
       "      <td>14</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    0   1   2   3   4   5\n",
       "0   0   1   2   3   4 NaN\n",
       "1   5   6   7   8   9 NaN\n",
       "2  10  11  12  13  14 NaN"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "df = pd.DataFrame(np.arange(15).reshape(3, 5))\n",
    "df[5] = np.nan\n",
    "display(df)\n",
    "\n",
    "# 默认以逗号作为分隔符，可以使用sep来自定义分隔符。\n",
    "df.to_csv(\"data1.csv\", sep=\"-\")\n",
    "\n",
    "# 默认情况会写入标题（行标签索引）。可以使用header进行设置是否写入标题。True，写入（默认），False不写入。\n",
    "df.to_csv(\"data2.csv\", header=False)\n",
    "\n",
    "# 默认情况下，空值不显示，我们可以自定义空值的显式效果（内容）。\n",
    "df.to_csv(\"data3.csv\", header=False, na_rep=\"空\")\n",
    "\n",
    "# 行索引，默认写入，我们可以通过参数index来设置是否写入行索引。True，写入（默认）， False，不写入。\n",
    "df.to_csv(\"data4.csv\", header=False, na_rep=\"空\", index=False)\n",
    "\n",
    "# 可以通过index_label来设置行索引的名称。\n",
    "df.to_csv(\"data5.csv\", index_label=\"index_name\")\n",
    "\n",
    "# 我们可以通过colomns列来设置那些列写入到文件中。默认为写入所有列。\n",
    "df.to_csv(\"data6.csv\", columns=[1, 3], header=False, na_rep=\"空\", index=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "61"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "',0,1,2,3,4,5\\r\\n0,0,1,2,3,4,\\r\\n1,5,6,7,8,9,\\r\\n2,10,11,12,13,14,\\r\\n'"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "',0,1,2,3,4,5\\r\\n0,0,1,2,3,4,\\r\\n1,5,6,7,8,9,\\r\\n2,10,11,12,13,14,\\r\\n'"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# to_csv 不仅可以写入硬盘文件中，也可以写入内存（类文件对象）中。【处理速度更快，性能更好】\n",
    "# 类文件对象：像文件那样具有read，write等功能的对象。\n",
    "\n",
    "# StringIO 处理文本类型\n",
    "# BytesIO 处理二进制类型\n",
    "from io import StringIO, BytesIO\n",
    "# 创建一个类文件对象\n",
    "str_io = StringIO()\n",
    "df.to_csv(str_io)\n",
    "\n",
    "# 查看文件指针的位置\n",
    "display(str_io.tell())\n",
    "\n",
    "# 调整文件指针的位置。将指针调整到文件的最前端。\n",
    "str_io.seek(0)\n",
    "display(str_io.read())\n",
    "\n",
    "# 可以调用getvalue方法取出StringIO对象中的数据。（不用调整指针，再去读取）\n",
    "display(str_io.getvalue())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2、数据清洗\n",
    "我们需要对数据进行一些预处理操作，才能用到后续的数据分析与机器学习中。这是因为，无论数据的来源为何处，我们都不可能保证数据一定是准确无误的。  \n",
    "数据清洗可以包含以下几方面内容：\n",
    "* 处理缺失值\n",
    "* 处理异常值\n",
    "* 处理重复值\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-4-28</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>采蘑菇的小姑娘;小蓓蕾组合;90;儿歌</td>\n",
       "      <td>216.0</td>\n",
       "      <td>1392.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-24</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>我;张国荣;80;励志</td>\n",
       "      <td>273.0</td>\n",
       "      <td>1447.17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "      <td>my way;张敬轩;90;励志</td>\n",
       "      <td>52.0</td>\n",
       "      <td>337.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            0                          1  \\\n",
       "0   2015-4-28    http://www.apinpai.com/   \n",
       "1   2015-8-24    http://www.apinpai.com/   \n",
       "2  2015-12-14  http://www.movie.com/dor/   \n",
       "3    2015-4-2       http://bj.qu114.com/   \n",
       "4  2015-12-19  http://www.movie.com/dor/   \n",
       "\n",
       "                                                   2      3        4  \n",
       "0                                采蘑菇的小姑娘;小蓓蕾组合;90;儿歌  216.0  1392.68  \n",
       "1                                        我;张国荣;80;励志  273.0  1447.17  \n",
       "2  《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...    NaN      NaN  \n",
       "3                                   my way;张敬轩;90;励志   52.0   337.27  \n",
       "4  《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...    NaN      NaN  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "df = pd.read_csv(\"spider.csv\", header=None)\n",
    "display(df.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.1、缺失值处理\n",
    "### 发现缺失值\n",
    "Pandas中，会将float类型的nan与None视为缺失值，我们可以通过如下方法来检测缺失值：\n",
    "* info\n",
    "* isnull\n",
    "* notnull\n",
    "\n",
    "说明：\n",
    "* 判断是否存在空值，可以将isnull与any或all结合使用。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 1396 entries, 0 to 1395\n",
      "Data columns (total 5 columns):\n",
      "0    1396 non-null object\n",
      "1    1396 non-null object\n",
      "2    1396 non-null object\n",
      "3    1098 non-null float64\n",
      "4    1098 non-null float64\n",
      "dtypes: float64(2), object(3)\n",
      "memory usage: 54.7+ KB\n"
     ]
    }
   ],
   "source": [
    "# 检测缺失值，首先可以调用info方法进行整体查看。\n",
    "# info方法可以显示DataFrame中每列的相关信息。\n",
    "df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       0      1      2      3      4\n",
       "0  False  False  False  False  False\n",
       "1  False  False  False  False  False\n",
       "2  False  False  False   True   True\n",
       "3  False  False  False  False  False\n",
       "4  False  False  False   True   True"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 没有空值。\n",
    "display(df.isnull().head())\n",
    "display(df[2].isnull().any())\n",
    "display(df[3].isnull().any())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 丢弃缺失值\n",
    "对于缺失值，我们可以将其进行丢弃处理（dropna）。\n",
    " \n",
    "说明：\n",
    "* how：指定dropna丢弃缺失值的行为，默认为any。\n",
    "* axis：指定丢弃行或者丢弃列（默认为丢弃行）。\n",
    "* thresh：当非空数值达到该值时，保留数据，否则删除。\n",
    "* inplace：指定是否就地修改，默认为False。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-4-28</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>采蘑菇的小姑娘;小蓓蕾组合;90;儿歌</td>\n",
       "      <td>216.0</td>\n",
       "      <td>1392.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-24</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>我;张国荣;80;励志</td>\n",
       "      <td>273.0</td>\n",
       "      <td>1447.17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "      <td>my way;张敬轩;90;励志</td>\n",
       "      <td>52.0</td>\n",
       "      <td>337.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            0                          1  \\\n",
       "0   2015-4-28    http://www.apinpai.com/   \n",
       "1   2015-8-24    http://www.apinpai.com/   \n",
       "2  2015-12-14  http://www.movie.com/dor/   \n",
       "3    2015-4-2       http://bj.qu114.com/   \n",
       "4  2015-12-19  http://www.movie.com/dor/   \n",
       "\n",
       "                                                   2      3        4  \n",
       "0                                采蘑菇的小姑娘;小蓓蕾组合;90;儿歌  216.0  1392.68  \n",
       "1                                        我;张国荣;80;励志  273.0  1447.17  \n",
       "2  《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...    NaN      NaN  \n",
       "3                                   my way;张敬轩;90;励志   52.0   337.27  \n",
       "4  《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...    NaN      NaN  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Int64Index: 1098 entries, 0 to 1395\n",
      "Data columns (total 5 columns):\n",
      "0    1098 non-null object\n",
      "1    1098 non-null object\n",
      "2    1098 non-null object\n",
      "3    1098 non-null float64\n",
      "4    1098 non-null float64\n",
      "dtypes: float64(2), object(3)\n",
      "memory usage: 51.5+ KB\n",
      "<class 'pandas.core.frame.DataFrame'>\n",
      "Int64Index: 1396 entries, 0 to 1395\n",
      "Data columns (total 5 columns):\n",
      "0    1396 non-null object\n",
      "1    1396 non-null object\n",
      "2    1396 non-null object\n",
      "3    1098 non-null float64\n",
      "4    1098 non-null float64\n",
      "dtypes: float64(2), object(3)\n",
      "memory usage: 65.4+ KB\n"
     ]
    }
   ],
   "source": [
    "display(df.head())\n",
    "\n",
    "# 处理空值。丢弃空值，使用dropna。\n",
    "df.dropna().info()\n",
    "\n",
    "# 默认情况下，how的值为any，表示只要存在空值，就丢弃行（列），我们可以指定为all，表示所有值为空值时，才进行删除。\n",
    "df.dropna(how=\"all\").info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-4-28</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>采蘑菇的小姑娘;小蓓蕾组合;90;儿歌</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-24</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>我;张国荣;80;励志</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "      <td>my way;张敬轩;90;励志</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            0                          1  \\\n",
       "0   2015-4-28    http://www.apinpai.com/   \n",
       "1   2015-8-24    http://www.apinpai.com/   \n",
       "2  2015-12-14  http://www.movie.com/dor/   \n",
       "3    2015-4-2       http://bj.qu114.com/   \n",
       "4  2015-12-19  http://www.movie.com/dor/   \n",
       "\n",
       "                                                   2  \n",
       "0                                采蘑菇的小姑娘;小蓓蕾组合;90;儿歌  \n",
       "1                                        我;张国荣;80;励志  \n",
       "2  《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...  \n",
       "3                                   my way;张敬轩;90;励志  \n",
       "4  《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-4-28</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>采蘑菇的小姑娘;小蓓蕾组合;90;儿歌</td>\n",
       "      <td>216.0</td>\n",
       "      <td>1392.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-24</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>我;张国荣;80;励志</td>\n",
       "      <td>273.0</td>\n",
       "      <td>1447.17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "      <td>my way;张敬轩;90;励志</td>\n",
       "      <td>52.0</td>\n",
       "      <td>337.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            0                          1  \\\n",
       "0   2015-4-28    http://www.apinpai.com/   \n",
       "1   2015-8-24    http://www.apinpai.com/   \n",
       "2  2015-12-14  http://www.movie.com/dor/   \n",
       "3    2015-4-2       http://bj.qu114.com/   \n",
       "4  2015-12-19  http://www.movie.com/dor/   \n",
       "\n",
       "                                                   2      3        4  \n",
       "0                                采蘑菇的小姑娘;小蓓蕾组合;90;儿歌  216.0  1392.68  \n",
       "1                                        我;张国荣;80;励志  273.0  1447.17  \n",
       "2  《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...    NaN      NaN  \n",
       "3                                   my way;张敬轩;90;励志   52.0   337.27  \n",
       "4  《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...    NaN      NaN  "
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 默认，存在空值，会丢弃行，我们可以指定丢弃列。(axis=0表示按行删除，axis=1表示按列删除。)\n",
    "display(df.dropna(axis=1).head())\n",
    "\n",
    "# 有时候，how的any与all可能都不太合适。any的条件太宽松，all又太严格。我们可以自定义删除的门槛。\n",
    "# 通过thresh来指定（门槛）。指的是非空的数据至少要达到thresh指定的数量时，整个行（列）才会保留，否则就删除。\n",
    "df.dropna(thresh=3).head()\n",
    "# 可以使用inplace来设置是否进行就地修改。默认为False。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 填充缺失值\n",
    "我们也可以对缺失值进行填充（fillna）。\n",
    "\n",
    "说明：\n",
    "* value：填充所使用的值。可以是一个字典，这样就可以为DataFrame的不同列指定不同的填充值。\n",
    "* method：指定前值（上一个有效值）填充（pad / ffill），还是后值（下一个有效值）填充（backfill / bfill）。\n",
    "* limit：如果指定method，表示最大连续NaN的填充数量，如果没有指定method，则表示最大的NaN填充数量。\n",
    "* inplace：指定是否就地修改，默认为False。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-4-28</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>采蘑菇的小姑娘;小蓓蕾组合;90;儿歌</td>\n",
       "      <td>216.0</td>\n",
       "      <td>1392.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-24</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>我;张国荣;80;励志</td>\n",
       "      <td>273.0</td>\n",
       "      <td>1447.17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>10000.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "      <td>my way;张敬轩;90;励志</td>\n",
       "      <td>52.0</td>\n",
       "      <td>337.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...</td>\n",
       "      <td>10000.0</td>\n",
       "      <td>10000.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            0                          1  \\\n",
       "0   2015-4-28    http://www.apinpai.com/   \n",
       "1   2015-8-24    http://www.apinpai.com/   \n",
       "2  2015-12-14  http://www.movie.com/dor/   \n",
       "3    2015-4-2       http://bj.qu114.com/   \n",
       "4  2015-12-19  http://www.movie.com/dor/   \n",
       "\n",
       "                                                   2        3         4  \n",
       "0                                采蘑菇的小姑娘;小蓓蕾组合;90;儿歌    216.0   1392.68  \n",
       "1                                        我;张国荣;80;励志    273.0   1447.17  \n",
       "2  《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...  10000.0  10000.00  \n",
       "3                                   my way;张敬轩;90;励志     52.0    337.27  \n",
       "4  《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...  10000.0  10000.00  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-4-28</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>采蘑菇的小姑娘;小蓓蕾组合;90;儿歌</td>\n",
       "      <td>216.0</td>\n",
       "      <td>1392.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-24</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>我;张国荣;80;励志</td>\n",
       "      <td>273.0</td>\n",
       "      <td>1447.17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...</td>\n",
       "      <td>5000.0</td>\n",
       "      <td>1000.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "      <td>my way;张敬轩;90;励志</td>\n",
       "      <td>52.0</td>\n",
       "      <td>337.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...</td>\n",
       "      <td>5000.0</td>\n",
       "      <td>1000.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            0                          1  \\\n",
       "0   2015-4-28    http://www.apinpai.com/   \n",
       "1   2015-8-24    http://www.apinpai.com/   \n",
       "2  2015-12-14  http://www.movie.com/dor/   \n",
       "3    2015-4-2       http://bj.qu114.com/   \n",
       "4  2015-12-19  http://www.movie.com/dor/   \n",
       "\n",
       "                                                   2       3        4  \n",
       "0                                采蘑菇的小姑娘;小蓓蕾组合;90;儿歌   216.0  1392.68  \n",
       "1                                        我;张国荣;80;励志   273.0  1447.17  \n",
       "2  《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...  5000.0  1000.00  \n",
       "3                                   my way;张敬轩;90;励志    52.0   337.27  \n",
       "4  《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...  5000.0  1000.00  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 填充值 fillna进行填充。\n",
    "# 使用固定值来填充所有的列。\n",
    "df1 = df.fillna(10000)\n",
    "display(df1.head())\n",
    "\n",
    "# 可以提供一个字典，这样就能够为不同的列，填充不同的值。\n",
    "# 字典的key指定索引，value指定填充值。\n",
    "df2 = df.fillna({3: 5000, 4:1000})\n",
    "display(df2.head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-4-28</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>采蘑菇的小姑娘;小蓓蕾组合;90;儿歌</td>\n",
       "      <td>216.0</td>\n",
       "      <td>1392.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-24</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>我;张国荣;80;励志</td>\n",
       "      <td>273.0</td>\n",
       "      <td>1447.17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...</td>\n",
       "      <td>273.0</td>\n",
       "      <td>1447.17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "      <td>my way;张敬轩;90;励志</td>\n",
       "      <td>52.0</td>\n",
       "      <td>337.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...</td>\n",
       "      <td>52.0</td>\n",
       "      <td>337.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2015-5-28</td>\n",
       "      <td>http://www.favolist.com/</td>\n",
       "      <td>屋顶;温岚;80;励志</td>\n",
       "      <td>217.0</td>\n",
       "      <td>903.29</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            0                          1  \\\n",
       "0   2015-4-28    http://www.apinpai.com/   \n",
       "1   2015-8-24    http://www.apinpai.com/   \n",
       "2  2015-12-14  http://www.movie.com/dor/   \n",
       "3    2015-4-2       http://bj.qu114.com/   \n",
       "4  2015-12-19  http://www.movie.com/dor/   \n",
       "5   2015-5-28   http://www.favolist.com/   \n",
       "\n",
       "                                                   2      3        4  \n",
       "0                                采蘑菇的小姑娘;小蓓蕾组合;90;儿歌  216.0  1392.68  \n",
       "1                                        我;张国荣;80;励志  273.0  1447.17  \n",
       "2  《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...  273.0  1447.17  \n",
       "3                                   my way;张敬轩;90;励志   52.0   337.27  \n",
       "4  《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...   52.0   337.27  \n",
       "5                                        屋顶;温岚;80;励志  217.0   903.29  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-4-28</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>采蘑菇的小姑娘;小蓓蕾组合;90;儿歌</td>\n",
       "      <td>216.0</td>\n",
       "      <td>1392.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-24</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>我;张国荣;80;励志</td>\n",
       "      <td>273.0</td>\n",
       "      <td>1447.17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...</td>\n",
       "      <td>52.0</td>\n",
       "      <td>337.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "      <td>my way;张敬轩;90;励志</td>\n",
       "      <td>52.0</td>\n",
       "      <td>337.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...</td>\n",
       "      <td>217.0</td>\n",
       "      <td>903.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2015-5-28</td>\n",
       "      <td>http://www.favolist.com/</td>\n",
       "      <td>屋顶;温岚;80;励志</td>\n",
       "      <td>217.0</td>\n",
       "      <td>903.29</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            0                          1  \\\n",
       "0   2015-4-28    http://www.apinpai.com/   \n",
       "1   2015-8-24    http://www.apinpai.com/   \n",
       "2  2015-12-14  http://www.movie.com/dor/   \n",
       "3    2015-4-2       http://bj.qu114.com/   \n",
       "4  2015-12-19  http://www.movie.com/dor/   \n",
       "5   2015-5-28   http://www.favolist.com/   \n",
       "\n",
       "                                                   2      3        4  \n",
       "0                                采蘑菇的小姑娘;小蓓蕾组合;90;儿歌  216.0  1392.68  \n",
       "1                                        我;张国荣;80;励志  273.0  1447.17  \n",
       "2  《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...   52.0   337.27  \n",
       "3                                   my way;张敬轩;90;励志   52.0   337.27  \n",
       "4  《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...  217.0   903.29  \n",
       "5                                        屋顶;温岚;80;励志  217.0   903.29  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 我们可以使用method，来指定向前（后）填充。此种情况下，主要应用是记录之间有着紧密的关联（趋势）。例如，房价，股票。\n",
    "\n",
    "# 使用上一个有效值进行填充。\n",
    "display(df.fillna(method=\"ffill\").head(6))\n",
    "# 使用下一个有效值进行填充。\n",
    "display(df.fillna(method=\"bfill\").head(6))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-4-28</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>采蘑菇的小姑娘;小蓓蕾组合;90;儿歌</td>\n",
       "      <td>216.0</td>\n",
       "      <td>1392.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-24</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>我;张国荣;80;励志</td>\n",
       "      <td>273.0</td>\n",
       "      <td>1447.17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...</td>\n",
       "      <td>52.0</td>\n",
       "      <td>337.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "      <td>my way;张敬轩;90;励志</td>\n",
       "      <td>52.0</td>\n",
       "      <td>337.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...</td>\n",
       "      <td>217.0</td>\n",
       "      <td>903.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2015-5-28</td>\n",
       "      <td>http://www.favolist.com/</td>\n",
       "      <td>屋顶;温岚;80;励志</td>\n",
       "      <td>217.0</td>\n",
       "      <td>903.29</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            0                          1  \\\n",
       "0   2015-4-28    http://www.apinpai.com/   \n",
       "1   2015-8-24    http://www.apinpai.com/   \n",
       "2  2015-12-14  http://www.movie.com/dor/   \n",
       "3    2015-4-2       http://bj.qu114.com/   \n",
       "4  2015-12-19  http://www.movie.com/dor/   \n",
       "5   2015-5-28   http://www.favolist.com/   \n",
       "\n",
       "                                                   2      3        4  \n",
       "0                                采蘑菇的小姑娘;小蓓蕾组合;90;儿歌  216.0  1392.68  \n",
       "1                                        我;张国荣;80;励志  273.0  1447.17  \n",
       "2  《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...   52.0   337.27  \n",
       "3                                   my way;张敬轩;90;励志   52.0   337.27  \n",
       "4  《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...  217.0   903.29  \n",
       "5                                        屋顶;温岚;80;励志  217.0   903.29  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-4-28</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>采蘑菇的小姑娘;小蓓蕾组合;90;儿歌</td>\n",
       "      <td>216.0</td>\n",
       "      <td>1392.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-24</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>我;张国荣;80;励志</td>\n",
       "      <td>273.0</td>\n",
       "      <td>1447.17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...</td>\n",
       "      <td>50000.0</td>\n",
       "      <td>50000.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "      <td>my way;张敬轩;90;励志</td>\n",
       "      <td>52.0</td>\n",
       "      <td>337.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...</td>\n",
       "      <td>50000.0</td>\n",
       "      <td>50000.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2015-5-28</td>\n",
       "      <td>http://www.favolist.com/</td>\n",
       "      <td>屋顶;温岚;80;励志</td>\n",
       "      <td>217.0</td>\n",
       "      <td>903.29</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            0                          1  \\\n",
       "0   2015-4-28    http://www.apinpai.com/   \n",
       "1   2015-8-24    http://www.apinpai.com/   \n",
       "2  2015-12-14  http://www.movie.com/dor/   \n",
       "3    2015-4-2       http://bj.qu114.com/   \n",
       "4  2015-12-19  http://www.movie.com/dor/   \n",
       "5   2015-5-28   http://www.favolist.com/   \n",
       "\n",
       "                                                   2        3         4  \n",
       "0                                采蘑菇的小姑娘;小蓓蕾组合;90;儿歌    216.0   1392.68  \n",
       "1                                        我;张国荣;80;励志    273.0   1447.17  \n",
       "2  《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...  50000.0  50000.00  \n",
       "3                                   my way;张敬轩;90;励志     52.0    337.27  \n",
       "4  《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...  50000.0  50000.00  \n",
       "5                                        屋顶;温岚;80;励志    217.0    903.29  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-4-28</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>采蘑菇的小姑娘;小蓓蕾组合;90;儿歌</td>\n",
       "      <td>216</td>\n",
       "      <td>1392.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-24</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>我;张国荣;80;励志</td>\n",
       "      <td>273</td>\n",
       "      <td>1447.17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...</td>\n",
       "      <td>aa</td>\n",
       "      <td>aa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "      <td>my way;张敬轩;90;励志</td>\n",
       "      <td>52</td>\n",
       "      <td>337.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...</td>\n",
       "      <td>aa</td>\n",
       "      <td>aa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2015-5-28</td>\n",
       "      <td>http://www.favolist.com/</td>\n",
       "      <td>屋顶;温岚;80;励志</td>\n",
       "      <td>217</td>\n",
       "      <td>903.29</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            0                          1  \\\n",
       "0   2015-4-28    http://www.apinpai.com/   \n",
       "1   2015-8-24    http://www.apinpai.com/   \n",
       "2  2015-12-14  http://www.movie.com/dor/   \n",
       "3    2015-4-2       http://bj.qu114.com/   \n",
       "4  2015-12-19  http://www.movie.com/dor/   \n",
       "5   2015-5-28   http://www.favolist.com/   \n",
       "\n",
       "                                                   2    3        4  \n",
       "0                                采蘑菇的小姑娘;小蓓蕾组合;90;儿歌  216  1392.68  \n",
       "1                                        我;张国荣;80;励志  273  1447.17  \n",
       "2  《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...   aa       aa  \n",
       "3                                   my way;张敬轩;90;励志   52   337.27  \n",
       "4  《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...   aa       aa  \n",
       "5                                        屋顶;温岚;80;励志  217   903.29  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# limit参数。如果指定method，则表示最多连续填充。\n",
    "# 如果没有指定method，则表示总共填充。\n",
    "display(df.fillna(method=\"bfill\", limit=1).head(6))\n",
    "display(df.fillna(value=50000, limit=2).head(6))\n",
    "\n",
    "# 设置是否就地修改，默认为False。\n",
    "df.fillna(\"aa\", inplace=True)\n",
    "display(df.head(6))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.2、无效值处理\n",
    "### 检测无效值\n",
    "可以通过DataFrame对象的describe方法查看数据的统计信息。不同类型的列，统计信息也不太相同。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>1396</td>\n",
       "      <td>1396</td>\n",
       "      <td>1396</td>\n",
       "      <td>1396</td>\n",
       "      <td>1396</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unique</th>\n",
       "      <td>213</td>\n",
       "      <td>23</td>\n",
       "      <td>60</td>\n",
       "      <td>284</td>\n",
       "      <td>1094</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>top</th>\n",
       "      <td>2015-12-10</td>\n",
       "      <td>http://www.movie.com/bor/</td>\n",
       "      <td>一起走过的日子;刘德华;80;伤感</td>\n",
       "      <td>aa</td>\n",
       "      <td>aa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>freq</th>\n",
       "      <td>16</td>\n",
       "      <td>100</td>\n",
       "      <td>32</td>\n",
       "      <td>298</td>\n",
       "      <td>298</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 0                          1                  2     3     4\n",
       "count         1396                       1396               1396  1396  1396\n",
       "unique         213                         23                 60   284  1094\n",
       "top     2015-12-10  http://www.movie.com/bor/  一起走过的日子;刘德华;80;伤感    aa    aa\n",
       "freq            16                        100                 32   298   298"
      ]
     },
     "execution_count": 94,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv(\"spider.csv\", header=None)\n",
    "display(df.head())\n",
    "\n",
    "# 如果DataFrame当中存在数值列，则describe值显示数值列。\n",
    "df.describe()\n",
    "\n",
    "# 数值列的统计与非数值列的统计，结果不同。\n",
    "# df[[0, 1, 2]].describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.3、重复值处理\n",
    "在处理数据中，可能会出现重复的数据，我们通常需要将重复的记录删除。\n",
    "\n",
    "### 发现重复值\n",
    "我们可以通过duplicated方法来发现重复值。该方法返回Series类型的对象，值为布尔类型，表示是否与上一行重复。\n",
    "\n",
    "参数说明：\n",
    "* subset：指定依据哪些列判断是否重复，默认为所有列。\n",
    "* keep：指定记录被标记为重复（True）的规则。默认为first。\n",
    "\n",
    "### 删除重复值\n",
    "通过drop_duplicates可以删除重复值。\n",
    "\n",
    "参数说明：\n",
    "* subset：指定依据哪些列判断是否重复，默认为所有列。\n",
    "* keep：指定记录删除（或保留）的规则。默认为First。\n",
    "* inplace：指定是否就地修改，默认为False。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-4-28</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>采蘑菇的小姑娘;小蓓蕾组合;90;儿歌</td>\n",
       "      <td>216.0</td>\n",
       "      <td>1392.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-24</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>我;张国荣;80;励志</td>\n",
       "      <td>273.0</td>\n",
       "      <td>1447.17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "      <td>my way;张敬轩;90;励志</td>\n",
       "      <td>52.0</td>\n",
       "      <td>337.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            0                          1  \\\n",
       "0   2015-4-28    http://www.apinpai.com/   \n",
       "1   2015-8-24    http://www.apinpai.com/   \n",
       "2  2015-12-14  http://www.movie.com/dor/   \n",
       "3    2015-4-2       http://bj.qu114.com/   \n",
       "4  2015-12-19  http://www.movie.com/dor/   \n",
       "\n",
       "                                                   2      3        4  \n",
       "0                                采蘑菇的小姑娘;小蓓蕾组合;90;儿歌  216.0  1392.68  \n",
       "1                                        我;张国荣;80;励志  273.0  1447.17  \n",
       "2  《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...    NaN      NaN  \n",
       "3                                   my way;张敬轩;90;励志   52.0   337.27  \n",
       "4  《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...    NaN      NaN  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>509</th>\n",
       "      <td>2015-12-7</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《爱之初体验》;2015.8.7;2015.8.23;上海锦瑟天下影视有限公司;海涛;张超，...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>658</th>\n",
       "      <td>2015-12-6</td>\n",
       "      <td>http://www.movie.com/bor/</td>\n",
       "      <td>《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>733</th>\n",
       "      <td>2015-12-6</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《一念天堂》;2015.12.31;2016.2.13;天河盛宴，凯德盛世（北京）投资管理有...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>760</th>\n",
       "      <td>2015-12-17</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《百团大战》;2015.8.28;2015.10.11;八一电影制片厂；中国电影股份有限公司...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>812</th>\n",
       "      <td>2015-12-10</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《坏蛋必须死》;2015.11.27;2015.12.20;北京新力量、华谊兄弟、南京大道行...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>914</th>\n",
       "      <td>2015-12-23</td>\n",
       "      <td>http://www.movie.com/bor/</td>\n",
       "      <td>《万物生长》;2015.4.17;2015.5.24;北京劳雷影业、杭州果麦文化传媒、北京联...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>947</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《怦然星动》;2015.12.3;2016.1.10;欢瑞世纪，嘉行传媒，青春光线;陈国辉;...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>986</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>http://www.movie.com/bor/</td>\n",
       "      <td>《简单爱》;2015.7.3;2015.7.19;中视合利（北京）文化投资有限公司一鸣影业公...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1087</th>\n",
       "      <td>2015-12-13</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《探灵档案》;2015.3.7;2015.3.22;壹马时代文化传媒（北京）有限公司、北京盛...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1094</th>\n",
       "      <td>2015-12-10</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《探灵档案》;2015.3.7;2015.3.22;壹马时代文化传媒（北京）有限公司、北京盛...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1224</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/bor/</td>\n",
       "      <td>《一念天堂》;2015.12.31;2016.2.13;天河盛宴，凯德盛世（北京）投资管理有...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1261</th>\n",
       "      <td>2015-12-2</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1286</th>\n",
       "      <td>2015-12-4</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《破风》;2015.8.7;2015.9.13;恒大影视文化有限公司;林超贤;彭于晏，窦骁，...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               0                                1  \\\n",
       "509    2015-12-7  http://www.movie.com/bor/12386／   \n",
       "658    2015-12-6        http://www.movie.com/bor/   \n",
       "733    2015-12-6  http://www.movie.com/bor/12386／   \n",
       "760   2015-12-17  http://www.movie.com/bor/12386／   \n",
       "812   2015-12-10  http://www.movie.com/bor/12386／   \n",
       "914   2015-12-23        http://www.movie.com/bor/   \n",
       "947   2015-12-25  http://www.movie.com/bor/12386／   \n",
       "986   2015-12-14        http://www.movie.com/bor/   \n",
       "1087  2015-12-13        http://www.movie.com/dor/   \n",
       "1094  2015-12-10  http://www.movie.com/bor/12386／   \n",
       "1224  2015-12-19        http://www.movie.com/bor/   \n",
       "1261   2015-12-2        http://www.movie.com/dor/   \n",
       "1286   2015-12-4        http://www.movie.com/dor/   \n",
       "\n",
       "                                                      2   3   4  \n",
       "509   《爱之初体验》;2015.8.7;2015.8.23;上海锦瑟天下影视有限公司;海涛;张超，... NaN NaN  \n",
       "658   《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影... NaN NaN  \n",
       "733   《一念天堂》;2015.12.31;2016.2.13;天河盛宴，凯德盛世（北京）投资管理有... NaN NaN  \n",
       "760   《百团大战》;2015.8.28;2015.10.11;八一电影制片厂；中国电影股份有限公司... NaN NaN  \n",
       "812   《坏蛋必须死》;2015.11.27;2015.12.20;北京新力量、华谊兄弟、南京大道行... NaN NaN  \n",
       "914   《万物生长》;2015.4.17;2015.5.24;北京劳雷影业、杭州果麦文化传媒、北京联... NaN NaN  \n",
       "947   《怦然星动》;2015.12.3;2016.1.10;欢瑞世纪，嘉行传媒，青春光线;陈国辉;... NaN NaN  \n",
       "986   《简单爱》;2015.7.3;2015.7.19;中视合利（北京）文化投资有限公司一鸣影业公... NaN NaN  \n",
       "1087  《探灵档案》;2015.3.7;2015.3.22;壹马时代文化传媒（北京）有限公司、北京盛... NaN NaN  \n",
       "1094  《探灵档案》;2015.3.7;2015.3.22;壹马时代文化传媒（北京）有限公司、北京盛... NaN NaN  \n",
       "1224  《一念天堂》;2015.12.31;2016.2.13;天河盛宴，凯德盛世（北京）投资管理有... NaN NaN  \n",
       "1261  《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三... NaN NaN  \n",
       "1286  《破风》;2015.8.7;2015.9.13;恒大影视文化有限公司;林超贤;彭于晏，窦骁，... NaN NaN  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "0       False\n",
       "1       False\n",
       "2       False\n",
       "3       False\n",
       "4       False\n",
       "        ...  \n",
       "1391     True\n",
       "1392    False\n",
       "1393     True\n",
       "1394     True\n",
       "1395    False\n",
       "Length: 1396, dtype: bool"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "df = pd.read_csv(\"spider.csv\", header=None)\n",
    "display(df.head())\n",
    "\n",
    "# 检测重复值\n",
    "display(df.duplicated().any())\n",
    "\n",
    "# 查看重复记录\n",
    "display(df[df.duplicated()])\n",
    "\n",
    "# 如果需要查看所有重复的记录，可以使用keep参数。\n",
    "# keep：\n",
    "# frist：前面的记录标记为True。\n",
    "# last： 后面的记录标记为True。\n",
    "# False：所有记录都标记为True。\n",
    "df[df.duplicated(keep=False)]\n",
    "\n",
    "# 可以使用subset参数来指定重复的规则。默认为所有列一致才认为是重复的。\n",
    "# 规则改为：只要第0，与第1列相同，则认为是重复的。\n",
    "display(df.duplicated(subset=[0, 1]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 103,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-4-28</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>采蘑菇的小姑娘;小蓓蕾组合;90;儿歌</td>\n",
       "      <td>216.0</td>\n",
       "      <td>1392.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-24</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>我;张国荣;80;励志</td>\n",
       "      <td>273.0</td>\n",
       "      <td>1447.17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "      <td>my way;张敬轩;90;励志</td>\n",
       "      <td>52.0</td>\n",
       "      <td>337.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2015-5-28</td>\n",
       "      <td>http://www.favolist.com/</td>\n",
       "      <td>屋顶;温岚;80;励志</td>\n",
       "      <td>217.0</td>\n",
       "      <td>903.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>2015-8-1</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>小兔子乖乖;小蓓蕾组合;90;儿歌</td>\n",
       "      <td>184.0</td>\n",
       "      <td>473.07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2015-8-6</td>\n",
       "      <td>http://www.99inf.com/</td>\n",
       "      <td>光辉岁月;beyond;80;励志</td>\n",
       "      <td>72.0</td>\n",
       "      <td>1051.73</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>2015-6-28</td>\n",
       "      <td>http://www.qudee.com/</td>\n",
       "      <td>逃脱;李玟;90;伤感</td>\n",
       "      <td>123.0</td>\n",
       "      <td>483.23</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>2015-8-7</td>\n",
       "      <td>http://www.alifenfen.com/</td>\n",
       "      <td>星;邓丽君;80;励志</td>\n",
       "      <td>257.0</td>\n",
       "      <td>1779.36</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>2015-7-15</td>\n",
       "      <td>http://www.waaku.com/</td>\n",
       "      <td>我心是海洋;蔡琴;80;励志</td>\n",
       "      <td>210.0</td>\n",
       "      <td>1240.49</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>2015-4-5</td>\n",
       "      <td>http://www.yifawang.cn/</td>\n",
       "      <td>同道中人;张国荣;80;励志</td>\n",
       "      <td>42.0</td>\n",
       "      <td>963.85</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>2015-12-22</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《将错就错》;2015.3.5;2015.3.29;中国电影股份有限公司等;王宁;小沈阳，田...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>2015-6-29</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>借过;容祖儿;90;伤感</td>\n",
       "      <td>16.0</td>\n",
       "      <td>1869.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>2015-4-26</td>\n",
       "      <td>http://info.tianya.cn</td>\n",
       "      <td>最冷一天;陈奕迅;90;伤感</td>\n",
       "      <td>259.0</td>\n",
       "      <td>554.35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>2015-4-6</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>加油;林俊杰/mc hotdog;90;励志</td>\n",
       "      <td>252.0</td>\n",
       "      <td>644.22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>2015-7-27</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>firework;katy perry;90;励志</td>\n",
       "      <td>71.0</td>\n",
       "      <td>697.20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>2015-6-8</td>\n",
       "      <td>http://beijing.faxinxi.cn/</td>\n",
       "      <td>给所有知道我名字的人;赵传;80;励志</td>\n",
       "      <td>174.0</td>\n",
       "      <td>1258.13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>2015-7-13</td>\n",
       "      <td>http://www.denghuo.com/</td>\n",
       "      <td>逃脱;李玟;90;伤感</td>\n",
       "      <td>253.0</td>\n",
       "      <td>333.04</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>2015-12-15</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《少年班》;2015.6.19;2015.7.19;工夫影业；华谊兄弟;肖洋;孙红雷，周冬雨...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>2015-12-1</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《既然青春留不住》;2015.10.23;2015.11.22;杭州和润影视有限公司;田蒙;...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>2015-8-5</td>\n",
       "      <td>http://www.waaku.com/</td>\n",
       "      <td>永远不要说放弃;童安格;80;励志</td>\n",
       "      <td>261.0</td>\n",
       "      <td>1353.83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>2015-5-31</td>\n",
       "      <td>http://www.alifenfen.com/</td>\n",
       "      <td>忘记拥抱;a-lin;80;伤感</td>\n",
       "      <td>260.0</td>\n",
       "      <td>112.41</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>2015-12-26</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《少年班》;2015.6.19;2015.7.19;工夫影业；华谊兄弟;肖洋;孙红雷，周冬雨...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>2015-8-30</td>\n",
       "      <td>http://www.qudee.com/</td>\n",
       "      <td>屋顶;温岚;80;励志</td>\n",
       "      <td>196.0</td>\n",
       "      <td>1847.63</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>2015-4-20</td>\n",
       "      <td>http://www.favolist.com/</td>\n",
       "      <td>story of my life;bon jovi;90;励志</td>\n",
       "      <td>214.0</td>\n",
       "      <td>1860.87</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>2015-6-21</td>\n",
       "      <td>http://www.denghuo.com/</td>\n",
       "      <td>借我;谢春花;90;伤感</td>\n",
       "      <td>272.0</td>\n",
       "      <td>937.38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>2015-3-19</td>\n",
       "      <td>http://beijing.hand2hand.cn/</td>\n",
       "      <td>太阳星辰;张学友;80;励志</td>\n",
       "      <td>19.0</td>\n",
       "      <td>1802.64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>2015-5-1</td>\n",
       "      <td>http://www.010y.com/</td>\n",
       "      <td>一起走过的日子;刘德华;80;伤感</td>\n",
       "      <td>174.0</td>\n",
       "      <td>621.15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>2015-12-5</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《一路惊喜》;2015.2.6;2015.3.8;万达影视传媒有限公司;金依萌/潘安子/章家...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>381</th>\n",
       "      <td>2015-4-9</td>\n",
       "      <td>http://www.99inf.com/</td>\n",
       "      <td>壮志雄心;李克勤;90;励志</td>\n",
       "      <td>164.0</td>\n",
       "      <td>1368.60</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>391</th>\n",
       "      <td>2015-7-21</td>\n",
       "      <td>http://beijing.faxinxi.cn/</td>\n",
       "      <td>无悔这一生;beyond;80;励志</td>\n",
       "      <td>201.0</td>\n",
       "      <td>927.21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>397</th>\n",
       "      <td>2015-3-29</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>感恩的心;欧阳菲菲;90;励志</td>\n",
       "      <td>260.0</td>\n",
       "      <td>1394.86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>399</th>\n",
       "      <td>2015-3-16</td>\n",
       "      <td>http://bj.454.cn/</td>\n",
       "      <td>my way;张敬轩;90;励志</td>\n",
       "      <td>207.0</td>\n",
       "      <td>907.09</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>415</th>\n",
       "      <td>2015-7-23</td>\n",
       "      <td>http://www.ezxun.com/</td>\n",
       "      <td>story of my life;bon jovi;90;励志</td>\n",
       "      <td>267.0</td>\n",
       "      <td>1530.07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>423</th>\n",
       "      <td>2015-3-20</td>\n",
       "      <td>http://beijing.faxinxi.cn/</td>\n",
       "      <td>我心是海洋;蔡琴;80;励志</td>\n",
       "      <td>260.0</td>\n",
       "      <td>1207.92</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>430</th>\n",
       "      <td>2015-6-2</td>\n",
       "      <td>http://www.wanxn.com/</td>\n",
       "      <td>像我一样骄傲;赵传;80;励志</td>\n",
       "      <td>57.0</td>\n",
       "      <td>320.97</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>436</th>\n",
       "      <td>2015-5-20</td>\n",
       "      <td>http://www.ezxun.com/</td>\n",
       "      <td>匿名的好友;杨丞琳;90;伤感</td>\n",
       "      <td>218.0</td>\n",
       "      <td>253.36</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>439</th>\n",
       "      <td>2015-5-26</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>逃脱;李玟;90;伤感</td>\n",
       "      <td>88.0</td>\n",
       "      <td>77.78</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>455</th>\n",
       "      <td>2015-7-10</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>the climb;miley cyrus;80;励志</td>\n",
       "      <td>85.0</td>\n",
       "      <td>1017.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>468</th>\n",
       "      <td>2015-6-3</td>\n",
       "      <td>http://www.wuhan58.com/index.php</td>\n",
       "      <td>加油;林俊杰/mc hotdog;90;励志</td>\n",
       "      <td>84.0</td>\n",
       "      <td>842.93</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>496</th>\n",
       "      <td>2015-4-3</td>\n",
       "      <td>http://www.99inf.com/</td>\n",
       "      <td>the climb;miley cyrus;80;励志</td>\n",
       "      <td>161.0</td>\n",
       "      <td>364.30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>507</th>\n",
       "      <td>2015-3-12</td>\n",
       "      <td>http://www.99inf.com/</td>\n",
       "      <td>无悔这一生;beyond;80;励志</td>\n",
       "      <td>123.0</td>\n",
       "      <td>1860.15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>513</th>\n",
       "      <td>2015-8-22</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "      <td>同道中人;张国荣;80;励志</td>\n",
       "      <td>59.0</td>\n",
       "      <td>210.59</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>523</th>\n",
       "      <td>2015-4-30</td>\n",
       "      <td>http://www.favolist.com/</td>\n",
       "      <td>我心是海洋;蔡琴;80;励志</td>\n",
       "      <td>262.0</td>\n",
       "      <td>250.25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>548</th>\n",
       "      <td>2015-6-13</td>\n",
       "      <td>http://www.wuhan58.com/index.php</td>\n",
       "      <td>光辉岁月;beyond;80;励志</td>\n",
       "      <td>253.0</td>\n",
       "      <td>62.30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>549</th>\n",
       "      <td>2015-8-9</td>\n",
       "      <td>http://www.wuhan58.com/index.php</td>\n",
       "      <td>借我;谢春花;90;伤感</td>\n",
       "      <td>105.0</td>\n",
       "      <td>1155.53</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>557</th>\n",
       "      <td>2015-4-7</td>\n",
       "      <td>http://www.denghuo.com/</td>\n",
       "      <td>数鸭子;少儿歌曲;90;儿歌</td>\n",
       "      <td>9.0</td>\n",
       "      <td>354.20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>566</th>\n",
       "      <td>2015-3-30</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>同道中人;张国荣;80;励志</td>\n",
       "      <td>55.0</td>\n",
       "      <td>200.74</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>599</th>\n",
       "      <td>2015-3-24</td>\n",
       "      <td>http://www.wanxn.com/</td>\n",
       "      <td>数鸭子;少儿歌曲;90;儿歌</td>\n",
       "      <td>267.0</td>\n",
       "      <td>348.71</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>617</th>\n",
       "      <td>2015-4-16</td>\n",
       "      <td>http://www.alifenfen.com/</td>\n",
       "      <td>感恩的心;欧阳菲菲;90;励志</td>\n",
       "      <td>69.0</td>\n",
       "      <td>1312.97</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>623</th>\n",
       "      <td>2015-6-23</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "      <td>太阳星辰;张学友;80;励志</td>\n",
       "      <td>50.0</td>\n",
       "      <td>63.85</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>655</th>\n",
       "      <td>2015-7-19</td>\n",
       "      <td>http://www.denghuo.com/</td>\n",
       "      <td>爱让世界更美;童安格;90;励志</td>\n",
       "      <td>181.0</td>\n",
       "      <td>99.50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>660</th>\n",
       "      <td>2015-8-15</td>\n",
       "      <td>http://www.yifawang.cn/</td>\n",
       "      <td>拔萝卜;小蓓蕾组合;90;儿歌</td>\n",
       "      <td>215.0</td>\n",
       "      <td>1605.36</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>693</th>\n",
       "      <td>2015-6-9</td>\n",
       "      <td>http://www.yifawang.cn/</td>\n",
       "      <td>壮志雄心;李克勤;90;励志</td>\n",
       "      <td>238.0</td>\n",
       "      <td>1032.40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>720</th>\n",
       "      <td>2015-4-24</td>\n",
       "      <td>http://bj.454.cn/</td>\n",
       "      <td>路...一直都在;陈奕迅;90;励志</td>\n",
       "      <td>49.0</td>\n",
       "      <td>90.50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>728</th>\n",
       "      <td>2015-3-23</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "      <td>来来回回;陈楚生&amp;spy.c;90;伤感</td>\n",
       "      <td>58.0</td>\n",
       "      <td>584.38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>731</th>\n",
       "      <td>2015-6-18</td>\n",
       "      <td>http://www.mouxiao.com/</td>\n",
       "      <td>数鸭子;少儿歌曲;90;儿歌</td>\n",
       "      <td>239.0</td>\n",
       "      <td>378.40</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>807</th>\n",
       "      <td>2015-3-13</td>\n",
       "      <td>http://www.qudee.com/</td>\n",
       "      <td>数鸭子;少儿歌曲;90;儿歌</td>\n",
       "      <td>9.0</td>\n",
       "      <td>1668.50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>922</th>\n",
       "      <td>2015-8-11</td>\n",
       "      <td>http://www.010y.com/</td>\n",
       "      <td>无悔这一生;beyond;80;励志</td>\n",
       "      <td>88.0</td>\n",
       "      <td>80.05</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>213 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "              0                          1  \\\n",
       "0     2015-4-28    http://www.apinpai.com/   \n",
       "1     2015-8-24    http://www.apinpai.com/   \n",
       "2    2015-12-14  http://www.movie.com/dor/   \n",
       "3      2015-4-2       http://bj.qu114.com/   \n",
       "4    2015-12-19  http://www.movie.com/dor/   \n",
       "..          ...                        ...   \n",
       "720   2015-4-24          http://bj.454.cn/   \n",
       "728   2015-3-23       http://bj.qu114.com/   \n",
       "731   2015-6-18    http://www.mouxiao.com/   \n",
       "807   2015-3-13      http://www.qudee.com/   \n",
       "922   2015-8-11       http://www.010y.com/   \n",
       "\n",
       "                                                     2      3        4  \n",
       "0                                  采蘑菇的小姑娘;小蓓蕾组合;90;儿歌  216.0  1392.68  \n",
       "1                                          我;张国荣;80;励志  273.0  1447.17  \n",
       "2    《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...    NaN      NaN  \n",
       "3                                     my way;张敬轩;90;励志   52.0   337.27  \n",
       "4    《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...    NaN      NaN  \n",
       "..                                                 ...    ...      ...  \n",
       "720                                 路...一直都在;陈奕迅;90;励志   49.0    90.50  \n",
       "728                               来来回回;陈楚生&spy.c;90;伤感   58.0   584.38  \n",
       "731                                     数鸭子;少儿歌曲;90;儿歌  239.0   378.40  \n",
       "807                                     数鸭子;少儿歌曲;90;儿歌    9.0  1668.50  \n",
       "922                                 无悔这一生;beyond;80;励志   88.0    80.05  \n",
       "\n",
       "[213 rows x 5 columns]"
      ]
     },
     "execution_count": 103,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 删除重复值\n",
    "df.drop_duplicates([0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3、数据过滤\n",
    "可以使用布尔数组或者索引数组的方式来过滤数据。  \n",
    "另外，也可以用DataFrame类的query方法来进行数据过滤。在query方法中也可以使用外面定义的变量，需要在变量前加上@。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 112,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《闯入者》;2015.4.30;2015.5.24;冬春文化、银润传媒、合润传媒、安乐电影、...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>65</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《少年班》;2015.6.19;2015.7.19;工夫影业；华谊兄弟;肖洋;孙红雷，周冬雨...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>305</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《天将雄师》;2015.2.19;2015.4.6;耀莱文化，华谊兄弟，上海电影集团;李仁港...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>320</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/</td>\n",
       "      <td>《探灵档案》;2015.3.7;2015.3.22;壹马时代文化传媒（北京）有限公司、北京盛...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>484</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《破风》;2015.8.7;2015.9.13;恒大影视文化有限公司;林超贤;彭于晏，窦骁，...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>493</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《分手再说我爱你》;2015.12.24;2016.1.17;爱奇艺影业（北京）有限公司、太...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>751</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《怦然星动》;2015.12.3;2016.1.10;欢瑞世纪，嘉行传媒，青春光线;陈国辉;...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>752</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《爱情魔发师》;2015.7.17;2015.8.2;北京仁和博纳文化传媒有限公司;倾海;游...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>791</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/</td>\n",
       "      <td>《天将雄师》;2015.2.19;2015.4.6;耀莱文化，华谊兄弟，上海电影集团;李仁港...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>853</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《冲上云霄》;2015.2.19;2015.3.29;寰亚电影制作有限公司;叶伟信，邹凯光;...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>947</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《怦然星动》;2015.12.3;2016.1.10;欢瑞世纪，嘉行传媒，青春光线;陈国辉;...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1114</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/</td>\n",
       "      <td>《既然青春留不住》;2015.10.23;2015.11.22;杭州和润影视有限公司;田蒙;...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1120</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/</td>\n",
       "      <td>《冲上云霄》;2015.2.19;2015.3.29;寰亚电影制作有限公司;叶伟信，邹凯光;...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1309</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《天将雄师》;2015.2.19;2015.4.6;耀莱文化，华谊兄弟，上海电影集团;李仁港...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1352</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《少年班》;2015.6.19;2015.7.19;工夫影业；华谊兄弟;肖洋;孙红雷，周冬雨...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               0                                1  \\\n",
       "32    2015-12-25        http://www.movie.com/dor/   \n",
       "42    2015-12-25        http://www.movie.com/dor/   \n",
       "65    2015-12-25  http://www.movie.com/bor/12386／   \n",
       "305   2015-12-25  http://www.movie.com/bor/12386／   \n",
       "320   2015-12-25        http://www.movie.com/bor/   \n",
       "484   2015-12-25        http://www.movie.com/dor/   \n",
       "493   2015-12-25        http://www.movie.com/dor/   \n",
       "751   2015-12-25  http://www.movie.com/bor/12386／   \n",
       "752   2015-12-25  http://www.movie.com/bor/12386／   \n",
       "791   2015-12-25        http://www.movie.com/bor/   \n",
       "853   2015-12-25  http://www.movie.com/bor/12386／   \n",
       "947   2015-12-25  http://www.movie.com/bor/12386／   \n",
       "1114  2015-12-25        http://www.movie.com/bor/   \n",
       "1120  2015-12-25        http://www.movie.com/bor/   \n",
       "1309  2015-12-25        http://www.movie.com/dor/   \n",
       "1352  2015-12-25        http://www.movie.com/dor/   \n",
       "\n",
       "                                                      2   3   4  \n",
       "32    《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三... NaN NaN  \n",
       "42    《闯入者》;2015.4.30;2015.5.24;冬春文化、银润传媒、合润传媒、安乐电影、... NaN NaN  \n",
       "65    《少年班》;2015.6.19;2015.7.19;工夫影业；华谊兄弟;肖洋;孙红雷，周冬雨... NaN NaN  \n",
       "305   《天将雄师》;2015.2.19;2015.4.6;耀莱文化，华谊兄弟，上海电影集团;李仁港... NaN NaN  \n",
       "320   《探灵档案》;2015.3.7;2015.3.22;壹马时代文化传媒（北京）有限公司、北京盛... NaN NaN  \n",
       "484   《破风》;2015.8.7;2015.9.13;恒大影视文化有限公司;林超贤;彭于晏，窦骁，... NaN NaN  \n",
       "493   《分手再说我爱你》;2015.12.24;2016.1.17;爱奇艺影业（北京）有限公司、太... NaN NaN  \n",
       "751   《怦然星动》;2015.12.3;2016.1.10;欢瑞世纪，嘉行传媒，青春光线;陈国辉;... NaN NaN  \n",
       "752   《爱情魔发师》;2015.7.17;2015.8.2;北京仁和博纳文化传媒有限公司;倾海;游... NaN NaN  \n",
       "791   《天将雄师》;2015.2.19;2015.4.6;耀莱文化，华谊兄弟，上海电影集团;李仁港... NaN NaN  \n",
       "853   《冲上云霄》;2015.2.19;2015.3.29;寰亚电影制作有限公司;叶伟信，邹凯光;... NaN NaN  \n",
       "947   《怦然星动》;2015.12.3;2016.1.10;欢瑞世纪，嘉行传媒，青春光线;陈国辉;... NaN NaN  \n",
       "1114  《既然青春留不住》;2015.10.23;2015.11.22;杭州和润影视有限公司;田蒙;... NaN NaN  \n",
       "1120  《冲上云霄》;2015.2.19;2015.3.29;寰亚电影制作有限公司;叶伟信，邹凯光;... NaN NaN  \n",
       "1309  《天将雄师》;2015.2.19;2015.4.6;耀莱文化，华谊兄弟，上海电影集团;李仁港... NaN NaN  \n",
       "1352  《少年班》;2015.6.19;2015.7.19;工夫影业；华谊兄弟;肖洋;孙红雷，周冬雨... NaN NaN  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>url</th>\n",
       "      <th>name</th>\n",
       "      <th>num1</th>\n",
       "      <th>num2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《闯入者》;2015.4.30;2015.5.24;冬春文化、银润传媒、合润传媒、安乐电影、...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>65</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《少年班》;2015.6.19;2015.7.19;工夫影业；华谊兄弟;肖洋;孙红雷，周冬雨...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>305</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《天将雄师》;2015.2.19;2015.4.6;耀莱文化，华谊兄弟，上海电影集团;李仁港...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>320</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/</td>\n",
       "      <td>《探灵档案》;2015.3.7;2015.3.22;壹马时代文化传媒（北京）有限公司、北京盛...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>484</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《破风》;2015.8.7;2015.9.13;恒大影视文化有限公司;林超贤;彭于晏，窦骁，...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>493</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《分手再说我爱你》;2015.12.24;2016.1.17;爱奇艺影业（北京）有限公司、太...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>751</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《怦然星动》;2015.12.3;2016.1.10;欢瑞世纪，嘉行传媒，青春光线;陈国辉;...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>752</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《爱情魔发师》;2015.7.17;2015.8.2;北京仁和博纳文化传媒有限公司;倾海;游...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>791</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/</td>\n",
       "      <td>《天将雄师》;2015.2.19;2015.4.6;耀莱文化，华谊兄弟，上海电影集团;李仁港...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>853</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《冲上云霄》;2015.2.19;2015.3.29;寰亚电影制作有限公司;叶伟信，邹凯光;...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>947</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《怦然星动》;2015.12.3;2016.1.10;欢瑞世纪，嘉行传媒，青春光线;陈国辉;...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1114</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/</td>\n",
       "      <td>《既然青春留不住》;2015.10.23;2015.11.22;杭州和润影视有限公司;田蒙;...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1120</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/</td>\n",
       "      <td>《冲上云霄》;2015.2.19;2015.3.29;寰亚电影制作有限公司;叶伟信，邹凯光;...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1309</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《天将雄师》;2015.2.19;2015.4.6;耀莱文化，华谊兄弟，上海电影集团;李仁港...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1352</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《少年班》;2015.6.19;2015.7.19;工夫影业；华谊兄弟;肖洋;孙红雷，周冬雨...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            date                              url  \\\n",
       "32    2015-12-25        http://www.movie.com/dor/   \n",
       "42    2015-12-25        http://www.movie.com/dor/   \n",
       "65    2015-12-25  http://www.movie.com/bor/12386／   \n",
       "305   2015-12-25  http://www.movie.com/bor/12386／   \n",
       "320   2015-12-25        http://www.movie.com/bor/   \n",
       "484   2015-12-25        http://www.movie.com/dor/   \n",
       "493   2015-12-25        http://www.movie.com/dor/   \n",
       "751   2015-12-25  http://www.movie.com/bor/12386／   \n",
       "752   2015-12-25  http://www.movie.com/bor/12386／   \n",
       "791   2015-12-25        http://www.movie.com/bor/   \n",
       "853   2015-12-25  http://www.movie.com/bor/12386／   \n",
       "947   2015-12-25  http://www.movie.com/bor/12386／   \n",
       "1114  2015-12-25        http://www.movie.com/bor/   \n",
       "1120  2015-12-25        http://www.movie.com/bor/   \n",
       "1309  2015-12-25        http://www.movie.com/dor/   \n",
       "1352  2015-12-25        http://www.movie.com/dor/   \n",
       "\n",
       "                                                   name  num1  num2  \n",
       "32    《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...   NaN   NaN  \n",
       "42    《闯入者》;2015.4.30;2015.5.24;冬春文化、银润传媒、合润传媒、安乐电影、...   NaN   NaN  \n",
       "65    《少年班》;2015.6.19;2015.7.19;工夫影业；华谊兄弟;肖洋;孙红雷，周冬雨...   NaN   NaN  \n",
       "305   《天将雄师》;2015.2.19;2015.4.6;耀莱文化，华谊兄弟，上海电影集团;李仁港...   NaN   NaN  \n",
       "320   《探灵档案》;2015.3.7;2015.3.22;壹马时代文化传媒（北京）有限公司、北京盛...   NaN   NaN  \n",
       "484   《破风》;2015.8.7;2015.9.13;恒大影视文化有限公司;林超贤;彭于晏，窦骁，...   NaN   NaN  \n",
       "493   《分手再说我爱你》;2015.12.24;2016.1.17;爱奇艺影业（北京）有限公司、太...   NaN   NaN  \n",
       "751   《怦然星动》;2015.12.3;2016.1.10;欢瑞世纪，嘉行传媒，青春光线;陈国辉;...   NaN   NaN  \n",
       "752   《爱情魔发师》;2015.7.17;2015.8.2;北京仁和博纳文化传媒有限公司;倾海;游...   NaN   NaN  \n",
       "791   《天将雄师》;2015.2.19;2015.4.6;耀莱文化，华谊兄弟，上海电影集团;李仁港...   NaN   NaN  \n",
       "853   《冲上云霄》;2015.2.19;2015.3.29;寰亚电影制作有限公司;叶伟信，邹凯光;...   NaN   NaN  \n",
       "947   《怦然星动》;2015.12.3;2016.1.10;欢瑞世纪，嘉行传媒，青春光线;陈国辉;...   NaN   NaN  \n",
       "1114  《既然青春留不住》;2015.10.23;2015.11.22;杭州和润影视有限公司;田蒙;...   NaN   NaN  \n",
       "1120  《冲上云霄》;2015.2.19;2015.3.29;寰亚电影制作有限公司;叶伟信，邹凯光;...   NaN   NaN  \n",
       "1309  《天将雄师》;2015.2.19;2015.4.6;耀莱文化，华谊兄弟，上海电影集团;李仁港...   NaN   NaN  \n",
       "1352  《少年班》;2015.6.19;2015.7.19;工夫影业；华谊兄弟;肖洋;孙红雷，周冬雨...   NaN   NaN  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>url</th>\n",
       "      <th>name</th>\n",
       "      <th>num1</th>\n",
       "      <th>num2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《闯入者》;2015.4.30;2015.5.24;冬春文化、银润传媒、合润传媒、安乐电影、...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>65</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《少年班》;2015.6.19;2015.7.19;工夫影业；华谊兄弟;肖洋;孙红雷，周冬雨...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>305</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《天将雄师》;2015.2.19;2015.4.6;耀莱文化，华谊兄弟，上海电影集团;李仁港...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>320</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/</td>\n",
       "      <td>《探灵档案》;2015.3.7;2015.3.22;壹马时代文化传媒（北京）有限公司、北京盛...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>484</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《破风》;2015.8.7;2015.9.13;恒大影视文化有限公司;林超贤;彭于晏，窦骁，...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>493</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《分手再说我爱你》;2015.12.24;2016.1.17;爱奇艺影业（北京）有限公司、太...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>751</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《怦然星动》;2015.12.3;2016.1.10;欢瑞世纪，嘉行传媒，青春光线;陈国辉;...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>752</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《爱情魔发师》;2015.7.17;2015.8.2;北京仁和博纳文化传媒有限公司;倾海;游...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>791</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/</td>\n",
       "      <td>《天将雄师》;2015.2.19;2015.4.6;耀莱文化，华谊兄弟，上海电影集团;李仁港...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>853</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《冲上云霄》;2015.2.19;2015.3.29;寰亚电影制作有限公司;叶伟信，邹凯光;...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>947</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/12386／</td>\n",
       "      <td>《怦然星动》;2015.12.3;2016.1.10;欢瑞世纪，嘉行传媒，青春光线;陈国辉;...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1114</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/</td>\n",
       "      <td>《既然青春留不住》;2015.10.23;2015.11.22;杭州和润影视有限公司;田蒙;...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1120</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/bor/</td>\n",
       "      <td>《冲上云霄》;2015.2.19;2015.3.29;寰亚电影制作有限公司;叶伟信，邹凯光;...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1309</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《天将雄师》;2015.2.19;2015.4.6;耀莱文化，华谊兄弟，上海电影集团;李仁港...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1352</th>\n",
       "      <td>2015-12-25</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《少年班》;2015.6.19;2015.7.19;工夫影业；华谊兄弟;肖洋;孙红雷，周冬雨...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            date                              url  \\\n",
       "32    2015-12-25        http://www.movie.com/dor/   \n",
       "42    2015-12-25        http://www.movie.com/dor/   \n",
       "65    2015-12-25  http://www.movie.com/bor/12386／   \n",
       "305   2015-12-25  http://www.movie.com/bor/12386／   \n",
       "320   2015-12-25        http://www.movie.com/bor/   \n",
       "484   2015-12-25        http://www.movie.com/dor/   \n",
       "493   2015-12-25        http://www.movie.com/dor/   \n",
       "751   2015-12-25  http://www.movie.com/bor/12386／   \n",
       "752   2015-12-25  http://www.movie.com/bor/12386／   \n",
       "791   2015-12-25        http://www.movie.com/bor/   \n",
       "853   2015-12-25  http://www.movie.com/bor/12386／   \n",
       "947   2015-12-25  http://www.movie.com/bor/12386／   \n",
       "1114  2015-12-25        http://www.movie.com/bor/   \n",
       "1120  2015-12-25        http://www.movie.com/bor/   \n",
       "1309  2015-12-25        http://www.movie.com/dor/   \n",
       "1352  2015-12-25        http://www.movie.com/dor/   \n",
       "\n",
       "                                                   name  num1  num2  \n",
       "32    《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...   NaN   NaN  \n",
       "42    《闯入者》;2015.4.30;2015.5.24;冬春文化、银润传媒、合润传媒、安乐电影、...   NaN   NaN  \n",
       "65    《少年班》;2015.6.19;2015.7.19;工夫影业；华谊兄弟;肖洋;孙红雷，周冬雨...   NaN   NaN  \n",
       "305   《天将雄师》;2015.2.19;2015.4.6;耀莱文化，华谊兄弟，上海电影集团;李仁港...   NaN   NaN  \n",
       "320   《探灵档案》;2015.3.7;2015.3.22;壹马时代文化传媒（北京）有限公司、北京盛...   NaN   NaN  \n",
       "484   《破风》;2015.8.7;2015.9.13;恒大影视文化有限公司;林超贤;彭于晏，窦骁，...   NaN   NaN  \n",
       "493   《分手再说我爱你》;2015.12.24;2016.1.17;爱奇艺影业（北京）有限公司、太...   NaN   NaN  \n",
       "751   《怦然星动》;2015.12.3;2016.1.10;欢瑞世纪，嘉行传媒，青春光线;陈国辉;...   NaN   NaN  \n",
       "752   《爱情魔发师》;2015.7.17;2015.8.2;北京仁和博纳文化传媒有限公司;倾海;游...   NaN   NaN  \n",
       "791   《天将雄师》;2015.2.19;2015.4.6;耀莱文化，华谊兄弟，上海电影集团;李仁港...   NaN   NaN  \n",
       "853   《冲上云霄》;2015.2.19;2015.3.29;寰亚电影制作有限公司;叶伟信，邹凯光;...   NaN   NaN  \n",
       "947   《怦然星动》;2015.12.3;2016.1.10;欢瑞世纪，嘉行传媒，青春光线;陈国辉;...   NaN   NaN  \n",
       "1114  《既然青春留不住》;2015.10.23;2015.11.22;杭州和润影视有限公司;田蒙;...   NaN   NaN  \n",
       "1120  《冲上云霄》;2015.2.19;2015.3.29;寰亚电影制作有限公司;叶伟信，邹凯光;...   NaN   NaN  \n",
       "1309  《天将雄师》;2015.2.19;2015.4.6;耀莱文化，华谊兄弟，上海电影集团;李仁港...   NaN   NaN  \n",
       "1352  《少年班》;2015.6.19;2015.7.19;工夫影业；华谊兄弟;肖洋;孙红雷，周冬雨...   NaN   NaN  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "df = pd.read_csv(\"spider.csv\", header=None)\n",
    "\n",
    "# 数据过滤通常的方式：\n",
    "# 通过判断条件生成一个布尔类型的数组，然后，DataFrame使用该布尔数组进行行过滤。\n",
    "df[0] == \"2015-12-25\"\n",
    "display(df[df[0] == \"2015-12-25\"])\n",
    "\n",
    "# 过滤的第二种方式。\n",
    "df.columns = [\"date\", \"url\", \"name\", \"num1\", \"num2\"]\n",
    "display(df.query(\"date == '2015-12-25'\"))\n",
    "\n",
    "# 如果在query方法中，需要使用外面定义的变量，可以在变量名称前加上@符号，进行引用。\n",
    "s = \"2015-12-25\"\n",
    "display(df.query(\"date == @s\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4、数据转换\n",
    "## 4.1、应用与映射\n",
    "Series与DataFrame对象可以进行行（列）或元素级别的映射转换操作。对于Series，可以调用apply或map方法。对于DataFrame，可以调用apply或applymap方法。  \n",
    "* apply：通过函数实现映射转换。【Series传递元素，DataFrame传递行或列。】\n",
    "* map：对当前Series的值进行映射转换。参数可以是一个Series，一个字典或者是一个函数。\n",
    "* applymap：通过函数实现元素级的映射转换。\n",
    "\n",
    "## 4.2、替换\n",
    "Series或DataFrame可以通过replace方法可以实现元素值的替换操作。\n",
    "* to_replace：被替换值，支持单一值，列表，字典，正则表达式。\n",
    "* regex：是否使用正则表达式，默认为False。\n",
    "\n",
    "## 4.3、字符串矢量级运算\n",
    "Series含有一个str属性，通过str能够进行字符串的矢量级运算。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\n",
      "2\n",
      "3\n",
      "4\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "0    11\n",
       "1    14\n",
       "2    19\n",
       "3    26\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "0    11\n",
       "1    14\n",
       "2    19\n",
       "3    26\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 向apply方法中传递的函数，需要定义一个参数。对于Series，依次传递每一个元素。对于DataFrame，则会依次传递每一行（每一列）。（取决于axis的值。）\n",
    "# 函数的返回值，用来表示处理的结果。\n",
    "def manage(x):\n",
    "    print(x)\n",
    "    return x ** 2 + 10\n",
    "    \n",
    "s = pd.Series([1, 2, 3, 4])\n",
    "s1 = s.apply(manage)\n",
    "display(s1)\n",
    "\n",
    "# 对于非常简单的函数，我们可以使用lambda来实现。\n",
    "s2 = s.apply(lambda x: x ** 2 + 10)\n",
    "display(s2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    a\n",
       "1    b\n",
       "2    c\n",
       "3    d\n",
       "dtype: object"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "0    a\n",
       "1    b\n",
       "2    c\n",
       "3    d\n",
       "dtype: object"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "0    4\n",
       "1    5\n",
       "2    6\n",
       "3    7\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Series的map函数，提供的是一种映射。\n",
    "s = pd.Series([1, 2, 3, 4])\n",
    "map_series = pd.Series([\"a\", \"b\", \"c\", \"d\"], index=[1, 2, 3, 4])\n",
    "\n",
    "# 参数可以是Series，则根据Series的index来进行映射，获取结果值。\n",
    "s1 = s.map(map_series)\n",
    "display(s1)\n",
    "\n",
    "# 参数也可以是一个字典。则根据字典的key进行映射，获取字典的value值。\n",
    "map_dict = {1:\"a\", 2:\"b\", 3:\"c\", 4:\"d\"}\n",
    "s2 = s.map(map_dict)\n",
    "display(s2)\n",
    "\n",
    "# map的参数也可以是一个函数，此时与apply有些类似。\n",
    "s3 = s.map(lambda x: x + 3)\n",
    "df = pd.read_csv(\"spider.csv\", header=None)display(s3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 141,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.series.Series'>\n",
      "<class 'pandas.core.series.Series'>\n",
      "<class 'pandas.core.series.Series'>\n",
      "<class 'pandas.core.series.Series'>\n",
      "<class 'pandas.core.series.Series'>\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>num1</th>\n",
       "      <th>num2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>100216.0</td>\n",
       "      <td>101392.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>100273.0</td>\n",
       "      <td>101447.17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>100052.0</td>\n",
       "      <td>100337.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>100217.0</td>\n",
       "      <td>100903.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>100184.0</td>\n",
       "      <td>100473.07</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>100072.0</td>\n",
       "      <td>101051.73</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       num1       num2\n",
       "0  100216.0  101392.68\n",
       "1  100273.0  101447.17\n",
       "3  100052.0  100337.27\n",
       "5  100217.0  100903.29\n",
       "6  100184.0  100473.07\n",
       "7  100072.0  101051.73"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "df = pd.read_csv(\"spider.csv\", header=None)\n",
    "df.columns = [\"date\", \"url\", \"name\", \"num1\", \"num2\"]\n",
    "\n",
    "# 关于DataFrame的apply与applymap方法。\n",
    "df.apply(lambda x : print(type(x)))\n",
    "\n",
    "# 自行求均值。\n",
    "df[[\"num1\", \"num2\"]].apply(lambda x: x.mean())\n",
    "\n",
    "# 参数为一个函数，DataFrame中的每个元素都会调用一次该函数（将元素传递给该函数），获得一个映射的结果（函数的返回值）。\n",
    "# applymap函数是一个元素级的映射，类似与Series的map函数。\n",
    "df1 = df[[\"num1\", \"num2\"]].applymap(lambda x: x + 100000)\n",
    "display(df1[df.notnull()].dropna().head(6))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 149,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>url</th>\n",
       "      <th>name</th>\n",
       "      <th>num1</th>\n",
       "      <th>num2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-4-28</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>采蘑菇的小姑娘;小蓓蕾组合;90;儿歌</td>\n",
       "      <td>216.0</td>\n",
       "      <td>1392.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-24</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>我;张国荣;80;励志</td>\n",
       "      <td>273.0</td>\n",
       "      <td>1447.17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "      <td>my way;张敬轩;90;励志</td>\n",
       "      <td>52.0</td>\n",
       "      <td>337.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date                        url  \\\n",
       "0   2015-4-28    http://www.apinpai.com/   \n",
       "1   2015-8-24    http://www.apinpai.com/   \n",
       "2  2015-12-14  http://www.movie.com/dor/   \n",
       "3    2015-4-2       http://bj.qu114.com/   \n",
       "4  2015-12-19  http://www.movie.com/dor/   \n",
       "\n",
       "                                                name   num1     num2  \n",
       "0                                采蘑菇的小姑娘;小蓓蕾组合;90;儿歌  216.0  1392.68  \n",
       "1                                        我;张国荣;80;励志  273.0  1447.17  \n",
       "2  《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...    NaN      NaN  \n",
       "3                                   my way;张敬轩;90;励志   52.0   337.27  \n",
       "4  《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...    NaN      NaN  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>url</th>\n",
       "      <th>name</th>\n",
       "      <th>num1</th>\n",
       "      <th>num2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1391</th>\n",
       "      <td>2015-7-31</td>\n",
       "      <td>http://beijing.faxinxi.cn/</td>\n",
       "      <td>同道中人;张国荣;80;励志</td>\n",
       "      <td>87.0</td>\n",
       "      <td>927.30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1392</th>\n",
       "      <td>2015-4-20</td>\n",
       "      <td>http://www.denghuo.com/</td>\n",
       "      <td>忘记拥抱;a-lin;80;伤感</td>\n",
       "      <td>31.0</td>\n",
       "      <td>684.56</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1393</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://www.yifawang.cn/</td>\n",
       "      <td>路...一直都在;陈奕迅;90;励志</td>\n",
       "      <td>47.0</td>\n",
       "      <td>1419.74</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1394</th>\n",
       "      <td>2015-4-15</td>\n",
       "      <td>http://www.wuhan58.com/index.php</td>\n",
       "      <td>像我一样骄傲;赵传;80;励志</td>\n",
       "      <td>124.0</td>\n",
       "      <td>1434.67</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1395</th>\n",
       "      <td>2015-5-16</td>\n",
       "      <td>http://www.favolist.com/</td>\n",
       "      <td>最冷一天;陈奕迅;90;伤感</td>\n",
       "      <td>103.0</td>\n",
       "      <td>1020.50</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           date                               url                name   num1  \\\n",
       "1391  2015-7-31        http://beijing.faxinxi.cn/      同道中人;张国荣;80;励志   87.0   \n",
       "1392  2015-4-20           http://www.denghuo.com/    忘记拥抱;a-lin;80;伤感   31.0   \n",
       "1393   2015-4-2           http://www.yifawang.cn/  路...一直都在;陈奕迅;90;励志   47.0   \n",
       "1394  2015-4-15  http://www.wuhan58.com/index.php     像我一样骄傲;赵传;80;励志  124.0   \n",
       "1395  2015-5-16          http://www.favolist.com/      最冷一天;陈奕迅;90;伤感  103.0   \n",
       "\n",
       "         num2  \n",
       "1391   927.30  \n",
       "1392   684.56  \n",
       "1393  1419.74  \n",
       "1394  1434.67  \n",
       "1395  1020.50  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>url</th>\n",
       "      <th>name</th>\n",
       "      <th>num1</th>\n",
       "      <th>num2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>941</th>\n",
       "      <td>2015-12-10</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《天将雄师》;2015.2.19;2015.4.6;耀莱文化，华谊兄弟，上海电影集团;李仁港...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>210</th>\n",
       "      <td>2015-7-3</td>\n",
       "      <td>http://beijing.faxinxi.cn/</td>\n",
       "      <td>太阳星辰;张学友;80;励志</td>\n",
       "      <td>135.0</td>\n",
       "      <td>1358.12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>480</th>\n",
       "      <td>2015-6-24</td>\n",
       "      <td>http://info.tianya.cn</td>\n",
       "      <td>加油;林俊杰/mc hotdog;90;励志</td>\n",
       "      <td>191.0</td>\n",
       "      <td>1135.37</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>402</th>\n",
       "      <td>2015-5-7</td>\n",
       "      <td>http://info.tianya.cn</td>\n",
       "      <td>借过;容祖儿;90;伤感</td>\n",
       "      <td>39.0</td>\n",
       "      <td>960.86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>940</th>\n",
       "      <td>2015-12-21</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《万物生长》;2015.4.17;2015.5.24;北京劳雷影业、杭州果麦文化传媒、北京联...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           date                         url  \\\n",
       "941  2015-12-10   http://www.movie.com/dor/   \n",
       "210    2015-7-3  http://beijing.faxinxi.cn/   \n",
       "480   2015-6-24       http://info.tianya.cn   \n",
       "402    2015-5-7       http://info.tianya.cn   \n",
       "940  2015-12-21   http://www.movie.com/dor/   \n",
       "\n",
       "                                                  name   num1     num2  \n",
       "941  《天将雄师》;2015.2.19;2015.4.6;耀莱文化，华谊兄弟，上海电影集团;李仁港...    NaN      NaN  \n",
       "210                                     太阳星辰;张学友;80;励志  135.0  1358.12  \n",
       "480                             加油;林俊杰/mc hotdog;90;励志  191.0  1135.37  \n",
       "402                                       借过;容祖儿;90;伤感   39.0   960.86  \n",
       "940  《万物生长》;2015.4.17;2015.5.24;北京劳雷影业、杭州果麦文化传媒、北京联...    NaN      NaN  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>url</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-4-28</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-24</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date                        url\n",
       "0   2015-4-28    http://www.apinpai.com/\n",
       "1   2015-8-24    http://www.apinpai.com/\n",
       "2  2015-12-14  http://www.movie.com/dor/\n",
       "3    2015-4-2       http://bj.qu114.com/\n",
       "4  2015-12-19  http://www.movie.com/dor/"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "df = pd.read_csv(\"spider.csv\", header=None)\n",
    "df.columns = [\"date\", \"url\", \"name\", \"num1\", \"num2\"]\n",
    "# 显示前n行，默认为5.\n",
    "display(df.head())\n",
    "# 显示后n行\n",
    "display(df.tail())\n",
    "# 随机采样，随机选择n行。n默认为1.\n",
    "display(df.sample(5))\n",
    "\n",
    "s = df[[\"date\", \"url\"]].head()\n",
    "display(s)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 155,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>url</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-5-28</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-24</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date                        url\n",
       "0   2015-5-28    http://www.apinpai.com/\n",
       "1   2015-8-24    http://www.apinpai.com/\n",
       "2  2015-12-14  http://www.movie.com/dor/\n",
       "3    2015-4-2       http://bj.qu114.com/\n",
       "4  2015-12-19  http://www.movie.com/dor/"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>url</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-5-28</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-5-28</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-5-28</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date                        url\n",
       "0   2015-5-28    http://www.apinpai.com/\n",
       "1   2015-5-28    http://www.apinpai.com/\n",
       "2   2015-5-28  http://www.movie.com/dor/\n",
       "3    2015-4-2       http://bj.qu114.com/\n",
       "4  2015-12-19  http://www.movie.com/dor/"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>url</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-4-29</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-25</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date                        url\n",
       "0   2015-4-29    http://www.apinpai.com/\n",
       "1   2015-8-25    http://www.apinpai.com/\n",
       "2  2015-12-14  http://www.movie.com/dor/\n",
       "3    2015-4-2       http://bj.qu114.com/\n",
       "4  2015-12-19  http://www.movie.com/dor/"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>url</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-4-29</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-24</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date                        url\n",
       "0   2015-4-29    http://www.apinpai.com/\n",
       "1   2015-8-24    http://www.apinpai.com/\n",
       "2  2015-12-14  http://www.movie.com/dor/\n",
       "3    2015-4-2       http://bj.qu114.com/\n",
       "4  2015-12-19  http://www.movie.com/dor/"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>url</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2017</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-24</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date                        url\n",
       "0        2017    http://www.apinpai.com/\n",
       "1   2015-8-24    http://www.apinpai.com/\n",
       "2  2015-12-14  http://www.movie.com/dor/\n",
       "3    2015-4-2       http://bj.qu114.com/\n",
       "4  2015-12-19  http://www.movie.com/dor/"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# replace的参数，可以是单一值。\n",
    "s1 = s.replace(\"2015-4-28\", \"2015-5-28\")\n",
    "display(s1)\n",
    "\n",
    "# replace也支持一个列表。将列表中的每一个元素都替换成2015-5-28(value参数指定的值。)\n",
    "s2 = s.replace([\"2015-4-28\", \"2015-8-24\", \"2015-12-14\"], \"2015-5-28\")\n",
    "display(s2)\n",
    "\n",
    "# 将多个值，每个值都替换成不同的值。\n",
    "s3 = s.replace([\"2015-4-28\", \"2015-8-24\"], [\"2015-4-29\", \"2015-8-25\"])\n",
    "display(s3)\n",
    "\n",
    "# replace也支持字典的形式。用来将多个值，替换成不同的值。key指定要替换的值，对应的value指定要替换成什么值。\n",
    "s4 = s.replace({\"2015-4-28\":\"2015-4-29\", \"2018-8-24\":\"2015-8-25\"})\n",
    "display(s4)\n",
    "\n",
    "# replace也支持正则表达式的形式。（这种是最为灵活的方式）\n",
    "# 注意：当需要进行正则表达式模式匹配时，需要将regex参数设置为True。（默认为False）\n",
    "s5 = s.replace(r\"[0-9]{4}-4-28\", \"2017\", regex=True)\n",
    "display(s5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 157,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>url</th>\n",
       "      <th>name</th>\n",
       "      <th>num1</th>\n",
       "      <th>num2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2034</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>采蘑菇的小姑娘;小蓓蕾组合;90;儿歌</td>\n",
       "      <td>216.0</td>\n",
       "      <td>1392.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-24</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>我;张国荣;80;励志</td>\n",
       "      <td>273.0</td>\n",
       "      <td>1447.17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>abcd</td>\n",
       "      <td>《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "      <td>my way;张敬轩;90;励志</td>\n",
       "      <td>52.0</td>\n",
       "      <td>337.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>abcd</td>\n",
       "      <td>《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date                      url  \\\n",
       "0        2034  http://www.apinpai.com/   \n",
       "1   2015-8-24  http://www.apinpai.com/   \n",
       "2  2015-12-14                     abcd   \n",
       "3    2015-4-2     http://bj.qu114.com/   \n",
       "4  2015-12-19                     abcd   \n",
       "\n",
       "                                                name   num1     num2  \n",
       "0                                采蘑菇的小姑娘;小蓓蕾组合;90;儿歌  216.0  1392.68  \n",
       "1                                        我;张国荣;80;励志  273.0  1447.17  \n",
       "2  《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...    NaN      NaN  \n",
       "3                                   my way;张敬轩;90;励志   52.0   337.27  \n",
       "4  《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...    NaN      NaN  "
      ]
     },
     "execution_count": 157,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv(\"spider.csv\", header=None)\n",
    "df.columns = [\"date\", \"url\", \"name\", \"num1\", \"num2\"]\n",
    "df.head()\n",
    "df.replace([\"2015-4-28\", \"http://www.movie.com/dor/\"], [\"2034\", \"abcd\"]).head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 164,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0          2034\n",
       "1     2015-8-24\n",
       "2    2015-12-14\n",
       "3      2015-4-2\n",
       "4    2015-12-19\n",
       "Name: date, dtype: object"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "df = pd.read_csv(\"spider.csv\", header=None)\n",
    "df.columns = [\"date\", \"url\", \"name\", \"num1\", \"num2\"]\n",
    "\n",
    "def t(item):\n",
    "    return \"2034\" if item == \"2015-4-28\" else item\n",
    "\n",
    "# replace的操作我们也可以通过apply或map来实现。\n",
    "# s.map({\"2015-4-28\":\"2034\"})\n",
    "display(df[\"date\"].map(t).head())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 169,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1\n",
       "1    2\n",
       "2    3\n",
       "dtype: int64"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "<pandas.core.strings.StringMethods at 0xa3a44a8>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "<pandas.core.strings.StringMethods at 0xa3a44a8>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "0     ABC\n",
       "1     DEF\n",
       "2    KEFE\n",
       "dtype: object"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/plain": [
       "0     True\n",
       "1    False\n",
       "2    False\n",
       "dtype: bool"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Series的str属性\n",
    "s = pd.Series([1, 2, 3])\n",
    "display(s)\n",
    "\n",
    "# 错误，使用Series的str属性时，需要Series元素的值是str（字符串）类型。\n",
    "s = pd.Series([\"abc\", \"def\", \"kefe\"])\n",
    "display(s.str)\n",
    "\n",
    "# str 的类型为pandas.core.strings.StringMethods，该类型提供了很多方法（与Python中str类型提供的方法相似），能够进行字符串的矢量级运算。\n",
    "display(s.str)\n",
    "display(s.str.upper())\n",
    "display(s.str.contains(\"b\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 174,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>url</th>\n",
       "      <th>name</th>\n",
       "      <th>num1</th>\n",
       "      <th>num2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-4-28</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>采蘑菇的小姑娘;小蓓蕾组合;90;儿歌</td>\n",
       "      <td>216.0</td>\n",
       "      <td>1392.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-24</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>我;张国荣;80;励志</td>\n",
       "      <td>273.0</td>\n",
       "      <td>1447.17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "      <td>my way;张敬轩;90;励志</td>\n",
       "      <td>52.0</td>\n",
       "      <td>337.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date                        url  \\\n",
       "0   2015-4-28    http://www.apinpai.com/   \n",
       "1   2015-8-24    http://www.apinpai.com/   \n",
       "2  2015-12-14  http://www.movie.com/dor/   \n",
       "3    2015-4-2       http://bj.qu114.com/   \n",
       "4  2015-12-19  http://www.movie.com/dor/   \n",
       "\n",
       "                                                name   num1     num2  \n",
       "0                                采蘑菇的小姑娘;小蓓蕾组合;90;儿歌  216.0  1392.68  \n",
       "1                                        我;张国荣;80;励志  273.0  1447.17  \n",
       "2  《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...    NaN      NaN  \n",
       "3                                   my way;张敬轩;90;励志   52.0   337.27  \n",
       "4  《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...    NaN      NaN  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "df = pd.read_csv(\"spider.csv\", header=None)\n",
    "df.columns = [\"date\", \"url\", \"name\", \"num1\", \"num2\"]\n",
    "display(df.head())\n",
    "\n",
    "# df2 = df[1].str.startswith(\"http://www.movie.com/dor/\")\n",
    "# display(df2)\n",
    "\n",
    "# 选择（过滤）所有电影相关的记录\n",
    "# t = df[df[1].str.startswith(\"http://www.movie.com/dor/\")]\n",
    "# sp = t[2].str.split(\";\")\n",
    "\n",
    "# 一列拆分成多列，在split的同时，增加参数expand的值为True。如果没有使用expand（默认为Fasle），则使用一个列表来存放拆分之后的元素。\n",
    "# sp = t[2].str.split(\";\", expand=True)\n",
    "# sp.info()\n",
    "\n",
    "# 注意：我们执行替换之后，尽管Series元素的值是数值类型，但是，我们Series对象的类型是没有改变的。\n",
    "# sp[7] = sp[7].str.replace(\"票房（万）\", \"\")\n",
    "\n",
    "# 对类型进行转换，转换成我们需要的类型（float）\n",
    "# sp[7] = sp[7].astype(np.float64)\n",
    "# display(sp[7].mean())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5、数据合并\n",
    "## concat\n",
    "我们可以通过DataFrame或Series类型的concat方法，来进行连接操作，连接时，会根据索引进行对齐。\n",
    "* axis：指定连接轴，默认为0。\n",
    "* join：指定连接方式，默认为外连接。【outer：并集，inner：交集】\n",
    "* keys：可以用来区分不同的数据组。\n",
    "* join_axes：指定连接结果集中保留的索引。\n",
    "* ignore_index：忽略原来连接的索引，创建新的整数序列索引，默认为False。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 175,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>url</th>\n",
       "      <th>name</th>\n",
       "      <th>num1</th>\n",
       "      <th>num2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-4-28</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>采蘑菇的小姑娘;小蓓蕾组合;90;儿歌</td>\n",
       "      <td>216.0</td>\n",
       "      <td>1392.68</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-24</td>\n",
       "      <td>http://www.apinpai.com/</td>\n",
       "      <td>我;张国荣;80;励志</td>\n",
       "      <td>273.0</td>\n",
       "      <td>1447.17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>http://bj.qu114.com/</td>\n",
       "      <td>my way;张敬轩;90;励志</td>\n",
       "      <td>52.0</td>\n",
       "      <td>337.27</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>http://www.movie.com/dor/</td>\n",
       "      <td>《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date                        url  \\\n",
       "0   2015-4-28    http://www.apinpai.com/   \n",
       "1   2015-8-24    http://www.apinpai.com/   \n",
       "2  2015-12-14  http://www.movie.com/dor/   \n",
       "3    2015-4-2       http://bj.qu114.com/   \n",
       "4  2015-12-19  http://www.movie.com/dor/   \n",
       "\n",
       "                                                name   num1     num2  \n",
       "0                                采蘑菇的小姑娘;小蓓蕾组合;90;儿歌  216.0  1392.68  \n",
       "1                                        我;张国荣;80;励志  273.0  1447.17  \n",
       "2  《恶棍天使》;2015.12.24;2016.2.13;天津橙子映像传媒有限公司、北京光线影...    NaN      NaN  \n",
       "3                                   my way;张敬轩;90;励志   52.0   337.27  \n",
       "4  《失孤》;2015.3.20;2015.5.3;华谊兄弟传媒集团、源合圣影视、映艺娱乐;彭三...    NaN      NaN  "
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "df = pd.read_csv(\"spider.csv\", header=None)\n",
    "df.columns = [\"date\", \"url\", \"name\", \"num1\", \"num2\"]\n",
    "display(df.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " ## append\n",
    "在对行进行连接时，也可以使用Series或DataFrame的append方法。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 206,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>date</th>\n",
       "      <th>num1</th>\n",
       "      <th>num2</th>\n",
       "      <th>num3</th>\n",
       "      <th>num4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2015-4-28</td>\n",
       "      <td>216.0</td>\n",
       "      <td>1392.68</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2015-8-24</td>\n",
       "      <td>273.0</td>\n",
       "      <td>1447.17</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2015-12-14</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>52.0</td>\n",
       "      <td>337.27</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2015-12-19</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>2015-7-31</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>927.30</td>\n",
       "      <td>87.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>2015-4-20</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>684.56</td>\n",
       "      <td>31.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2015-4-2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1419.74</td>\n",
       "      <td>47.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>2015-4-15</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1434.67</td>\n",
       "      <td>124.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>2015-5-16</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1020.50</td>\n",
       "      <td>103.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         date   num1     num2     num3   num4\n",
       "0   2015-4-28  216.0  1392.68      NaN    NaN\n",
       "1   2015-8-24  273.0  1447.17      NaN    NaN\n",
       "2  2015-12-14    NaN      NaN      NaN    NaN\n",
       "3    2015-4-2   52.0   337.27      NaN    NaN\n",
       "4  2015-12-19    NaN      NaN      NaN    NaN\n",
       "5   2015-7-31    NaN      NaN   927.30   87.0\n",
       "6   2015-4-20    NaN      NaN   684.56   31.0\n",
       "7    2015-4-2    NaN      NaN  1419.74   47.0\n",
       "8   2015-4-15    NaN      NaN  1434.67  124.0\n",
       "9   2015-5-16    NaN      NaN  1020.50  103.0"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "head = df[[\"date\", \"num1\", \"num2\"]].head()\n",
    "tail = df[[\"date\", \"num1\", \"num2\"]].tail()\n",
    "tail.columns = [\"date\", \"num4\", \"num3\"]\n",
    "# display(head, tail)\n",
    "\n",
    "# 在进行concat拼接（堆叠），时，会根据索引进行对齐。如果无法对齐，会产生空值。（NaN）\n",
    "# display(pd.concat((head, tail), sort=False))\n",
    "\n",
    "# 通过轴axis指定堆叠方向。0竖直方向，1水平方向。\n",
    "# display(pd.concat((head, tail), axis=0, sort=False))\n",
    "\n",
    "# 我们可以通过join指定连接方式。（outer，结果集显示并集， inner结果集显示交集。）\n",
    "# display(pd.concat((head, tail), join=\"inner\"))\n",
    "\n",
    "# 可以通过keys观看数据来源那一张表。（产生一个层级索引）\n",
    "# display(pd.concat((head, tail), keys=[\"head\", \"tail\"]))\n",
    "\n",
    "# 通过join_axes指定要保留的索引。\n",
    "# display(pd.concat((head, tail), join_axes=[head.columns]))\n",
    "# display(pd.concat((head, tail), join_axes=[tail.columns]))\n",
    "\n",
    "# 可以通过ignore_index设置为True，忽略以前的索引，重新创建连续的索引。\n",
    "display(pd.concat((head, tail), ignore_index=True, sort=True))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## merge\n",
    "通过pandas或DataFrame的merge方法，可以进行两个DataFrame的连接，这种连接类似于SQL中对两张表进行的join连接。  \n",
    "* how：指定连接方式。可以是inner, outer, left, right，默认为inner。\n",
    "* on 指定连接使用的列（该列必须同时出现在两个DataFrame中），默认使用两个DataFrame中的所有同名列进行连接。\n",
    "* left_on / right_on：指定左右DataFrame中连接所使用的列。\n",
    "* left_index / right_index：是否将左边（右边）DataFrame中的索引作为连接列，默认为False。\n",
    "* suffixes：当两个DataFrame列名相同时，指定每个列名的后缀（用来区分），默认为_x与_y。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "df = pd.DataFrame([[100, 2, 3], [3, 4, 5], [7, 8, 9]])\n",
    "df2 = pd.DataFrame([[1, 2, 4], [3, 4, 6], [10, 11, 12]], columns=[0, 1, 3])\n",
    "display(df, df2)\n",
    "# 根据所有同名字段（标签名）进行等值连接。\n",
    "# df.merge(df2)\n",
    "# 可以通过how指定连接方式。默认为内连接。\n",
    "# df.merge(df2, how=\"left\")\n",
    "# df.merge(df2, how=\"right\")\n",
    "# df.merge(df2, how=\"outer\")\n",
    "\n",
    "# 我们还可以通过on来指定连接的列（on指定的列必须同时出现在两个表之中）。（默认使用所有同名的列进行等值连接）\n",
    "# display(df.merge(df2))\n",
    "# display(df.merge(df2, on=1))\n",
    "\n",
    "# 如果连接的列名不同，则我们可以使用left_on与right_on参数分别指定左，右两张表用来进行等值连接的索引名。\n",
    "# df.merge(df2, left_on=1, right_on=3)\n",
    "\n",
    "# 我们可以通过left_index，right_index来指定，是否使用索引来充当连接条件。True，是，False，不是。\n",
    "# 注意：left_index（right_index）与left_on(right_on)不能同时指定。\n",
    "# df.merge(df2, left_index=True, right_index=True)\n",
    "# df.merge(df2, left_index=True, right_on=1)\n",
    "\n",
    "# 我们也可以自定义连接的后缀。（两张表存在同名字段时，可以使用。默认为_x，_y）\n",
    "# df.merge(df2, left_index=True, right_index=True, suffixes=[\"_table1\", \"_table2\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## join\n",
    "与merge方法类似，但是默认使用索引进行连接。\n",
    "* how：指定连接方式。可以是inner, outer, left, right，默认为left。\n",
    "* on：设置当前DataFrame对象使用哪个列与参数对象的索引进行连接。\n",
    "* lsuffix / rsuffix：当两个DataFrame列名相同时，指定每个列名的后缀（用来区分），如果不指定，列名相同会产生错误。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "df = pd.DataFrame([[100, 2, 3], [3, 4, 5], [7, 8, 9]])\n",
    "df2 = pd.DataFrame([[1, 2, 4], [3, 4, 6], [10, 11, 12]], columns=[0, 1, 3], index=[0, 1, 3])\n",
    "display(df, df2)\n",
    "# join与merge类似，都是进行两张表的连接。\n",
    "# 不同：\n",
    "#1 merge默认进行的内连接（inner），join默认进行的左外连接（left）。\n",
    "#2 当出现同名字段（列索引）时，merge可以自动补后缀（_x, _y），但是join不会自动补后缀，而是会产生错误。\n",
    "#3 merge默认使用同名的列进行等值连接。join默认使用左右两表的索引进行连接。\n",
    "#4 merge中on参数，指定两张表中共同的字段，而join中on参数，仅指定左表中的字段（右表依然使用索引）。\n",
    "\n",
    "# 如果没有指定连接方式，默认为左外连接（left）\n",
    "# df.join(df2,lsuffix=\"_x\", rsuffix=\"_y\")\n",
    "# 我们可以通过how指定连接方式。\n",
    "# df.join(df2, lsuffix=\"_x\", rsuffix=\"_y\", how=\"outer\")\n",
    "\n",
    "# on参数指定当前的表中使用哪个列与参数表（右侧表）的索引进行连接。\n",
    "df.join(df2, lsuffix=\"_x\", rsuffix=\"_y\", on=0)\n",
    "\n",
    "# merge与join侧重点不同，merge侧重的是使用字段进行连接，而join侧重的是使用索引进行连接。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
